MICROCOPY  RESOLUTION  TEST  CHART 

national  Bureau  of  standards  - 1»*  j  -  a 


AD  A  121  294 


^CLASSIFIED _  I' 

SECURITY  CCmSSITICATIOW  OF  THIS  PAGE  (When  Data  Entered) _  _  \ 

REPORT  DOCUMENTATION  PAGE  |  beforePcomplethIg^form 

t.  REPORT  NUMBER  _  _  -  a  |2.  govt  ACCESSION  NO.  3-  RECIPIENT'S  CATALOG  NUMBER 


<s> 


report  number  a 

AEOSR-TR-  82-0  933 


1 


1 4.  TITLE  (end  Submit) 


5.  TYPE  OF  REPORT  S  PERIOD  COVEREO 


THE  ANALYSIS  OF  DESIGN  OF  ROBUST  FINAL,  15  JUN  78-14  JUN  82 
NONLINEAR  ESTIMATORS  AND  ROBUST  SIGNAL  CODING  s.  performing  org.  report  number 
SCHEMES 


17.  authorc*; 


8.  CONTRACT  OR  GRANT  NUMBERfaJ 


Neal  C.  Gallagher,  Jr. 


AF0SR-78-3605 


12.  REPORT  DATE 


9.  PERFORMING  ORGANIZATION  NAME  ANO  ADDRESS  10.  PROGRAM  ELEMENT,  PROJECT.  TASK 

School  of  Electrical  Engineering  A*EA  4  WORK  unit  numbers 

Purdue  University  PE61102F;  2304/A6 

West  Lafayette  IN  47907 

II.  CONTROLLING  OFFICE  NAME  AND  ADDRESS  12.  REPORT  DATE 

Directorate  of  Mathematical  &  Information  Sciences  16  SEP  82 

Air  Force  Office  of  Scientific  Research  13.  number  of  pages 

Bolling  AFB  DC  20332  155 

t4.  MONITORING  AGENCY  name  a  AODRESSf//  dUterent  from  Controlling  01  lice)  IS.  SECURITY  CLASS,  (ol  thl m  report) 

UNCLASSIFIED 

15*.  DECLASSIFICATION'' DOWN  GRADING 
SCHEDULE 


16.  DISTRIBUTION  STATEMENT  (ol  thle  Report) 

Approved  for  public  release;  distribution  unlimited. 


17.  DISTRIBUTION  ST.  -4ENT  (ol  ttr  abetract  entered  In  Block  20,  It  different  from  Report) 


[  1®.  supplementary  .ctes 


19.  KEY  WORDS  (Continue  on  reveraa  aid*  It  necaaamry  and  Identity  by  block  numb«r) 

Source  coding;  estimation;  quantization;  median  filtering. 


DTIC 


NOV  0  8 1982 


5.  ABSTRACT  (Continue  on  ravaraa  alda  It  nacaaaary  and  Identify  bv  block  number) 

^Two  topics  of  engineering  interest  have  been  treated  in  this  research.  One  is 
block  or  vector  quantization,  which  deals  with  the  digital  representation  of 
multi-dimensional  signals.  Two  Ph.D.  dissertations  and  one  patent  application, 
in  addition  to  numerous  technical  articles  have  resulted  from  this  work.  The 
other  area  of  study  has  been  nonlinear  signal  estimating  which  has  lead  to  a 
study  of  median  filtering.  This  work  on  median  filtering  has  resulted  in  two 
Ph.D.  dissertations  in  addition  to  a  number  of  technical  publications, 

22  LI  08 


do 


_ UNCLASSIFIED  j _ 

SECURITY  CLASSIFICATION  of  THIS  PAGE  (Whan  Data  Entered) 


1E0SR-TR-  8  2-0933 


Final  Report 

Air  Force  Office  of  Scientific  Research 
Grant  No. 
AFOSR-78-3605 


-2- 


I.  General 

The  four  year  duration  of  this  grant  has  resulted  in  twenty-five  technical  publica¬ 
tions  in  the  general  areas  of  signal  estimation  and  source  coding.  A  copy  of  each  publi¬ 
cation  is  found  in  the  Appendix.  In  addition  to  these  publications,  one  patent  applica¬ 
tion  has  been  filed  dealing  with  a  novel  method  of  multi- dimensional  quantization. 

In  addition  to  the  numerous  publications  noted  above,  this  project  has  resulted  in 
the  graduation  of  four  Ph.D.  students  who  have  been  supported  in  whole  or  in  part 
through  this  grant.  Two  of  these  dissertations,  one  by  Jim  Bucklew  and  one  by  Kerry 
Rines,  deal  with  the  analysis  and  design  of  block  quantizers.  The  remaining  two  disser¬ 
tations  by  Gonzalo  Arce  and  Tom  Nodes  treat  properties  of  median  filters.  A  fifth 
dissertation  by  Tom  McCannon  is  still  being  researched.  This  research  concerns  the 
design  of  nonlinear  estimators  and  predictions.  Here  we  will  present  a  brief  description 
of  the  technical  results;  however,  the  detailed  discussion  is  contained  in  the  attached 
reprints. 

The  work  on  multidimensional  quantizers  began  with  a  search  to  find  better  ways 
of  quantizing  multidimensional  vectors.  We  started  with  a  study  of  vectors  with  Gaus¬ 
sian  distributions  and  then  generalized  to  circularly  symmetric  distributions.  We 
developed  new  derivations  for  bounds  on  quantizer  performance.  Finally,  we  developed 
a  very  simple  procedure  by  which  to  implement  the  known  optimum  quantizer  struc¬ 
tures.  This  procedure  has  been  the  subject  of  a  patent  application. 

Our  work  in  nonlinear  estimation  began  with  a  study  of  estimation  schemes  which 
used  an  extended  form  of  the  projection  theorem  in  their  design.  We  combined  polyno¬ 
mial  operations  with  linear  operations  in  the  estimator  design. 

Our  work  led  to  an  investigation  of  the  properties  of  the  median  filters.  Our  ini¬ 
tial  interest  in  the  median  filter  began  because  of  the  fact  that  these  median  methods 
really  seem  to  work  in  many  situations  where  linear  estimators  are  ner  ly  useless.  The 
problem  with  median  filters  (and  therefore  our  opportunity)  has  been  the  atmost  com¬ 
plete  lack  of  theory  on  their  properties  and  for  their  design.  We  have  viewed  this  as  a 
chance  to  make  a  significant  contribution  in  this  relatively  new  field  of  median 
methods.  We  believe  we  have  made  several  major  contributions  to  the  analysis  of 
median  filters  as  illustrated  by  two  Ph.D.  dissertations  and  a  number  of  invited  techni¬ 
cal  presentations  on  the  topic  of  median  filters.  Copies  of  these  dissertations  will  be 
mailed  as  separate  technical  reports. 


-■*  t  -  ---  -  -  -  _ I  -  - . .  ■  1-*. 


J  ■ 

! 

1 

I 

S 


1. 

2. 

3. 

4. 

5. 

6. 

7. 

8. 
9. 

10. 

11. 

12. 

13. 

14. 

15. 

16. 

17. 

18. 

19. 

20. 
21. 
22. 

23. 

24. 

25. 


APPENDIX 

REPRINTS  OF  TECHNICAL  PAPERS 

A  NOVEL  APPROACH  FOR  DESIGNING  NONLINEAR  DISCRETE  TIME  FILTERS:  PART  I 

A  NOVEL  APPROACH  FOR  DESIGNING  NONLINEAR  DISCRETE  TIME  FILTERS:  PART  II 

QUANTIZATION  OF  BIVARIATE  CIRCULARLY  SYMMETRIC  DENSITIES 

QUANTIZATION  IN  SPECTRAL  PHASE  CODING 

A  NOTE  ON  OPTIMAL  QUANTIZATION 

SOME  PROPERTIES  OF  UNIFORM  STEP  SIZE  QUANTIZERS* 

ON  THE  DETERMINATION  OF  REGRESSION  FUNCTIONS 

QUANTIZATION  SCHEMES  FOR  BIVARIATE  GUASSIAN  RANDOM  VARIABLES 

TVO- DIMENSIONAL  QUANTIZATION  OF  BIVARIATE  CIRCULARLY  SYMMETRIC  DENSITIES 

SOME  RESULTS  IN  MULTIDIMENSIONAL  QUANTIZATION  THEORY* 

SOME  RECENT  DEVELOPMENTS  IN  QUANTIZATION  THEORY* 

PASSBAND  AND  STOPBAND  PROPERTIES  OF  MEDIAN  FILTERS* 

ROOT-SIGNAL  SET  ANALYSIS  FOR  MEDIAN  FILTERS 

SOME  PROPERTIES  OF  UNIFORM  STEP  SIZE  QUANTIZERS 

SOME  MODIFICATIONS  TO  THE  MEDIAN  FILTER  PROCESS  AND  THEIR  PROPERTIES* 

THE  DESIGN  OF  MULTIDIMENSIONAL  QUANTIZERS  USING  PREQUANTIZATION 
A  NOVEL  APPROACH  FOR  THE  COMPUTATION  OF  ORTHONORMAL  POLYNOMIAL  EXPANSIONS 
SOME  RESULTS  ON  THE  MEDIAN  FILTERING  OF  SIGNALS  AND  ADDITIVE  WHITE  NOISE* 

ON  A  CLASS  OF  RANDOM  PROCESSES  EXHIBITING  OPTIMAL  NONLINEAR  ONE-STEP  PREDICTORS 
A  THEORETICAL  ANALYSIS  OF  THE  PROPERTIES  OF  MEDIAN  FILTERS 
PROPERTIES  OF  MINIMUM  MEAN  SQUARED  ERROR  BLOCK  QUANTIZERS 
NONUNIFORM  MULTIDIMENSIONAL  QUANTIZATION 

A  NOTE  ON  THE  COMPUTATION  OF  OPTIMAL  MINIMUM  MEAN-SQUARE  ERROR  QUANTIZERS 
ON  THE  DESIGN  OF  NONLINEAR  DISCRETE-TIME  PREDICTORS 

THE  DESIGN  OF  TWO-DIMENSIONAL  QUANTIZERS  USING  PREQUANTIZATION  . ,  .-c-cm 

AIR  FORCE  07  ' 

NOTICE  OF  - '  s  '■‘ni  is 

This  tf'~;  •’  t  yj  12, 

ppprov 


Distri ' 
MATT  HE. 
Chief ,  Ii- 


t  ion  Divisi on 


A  NOVEL  APPROACH  FOR  DESIGNING  NONLINEAR  DISCRETE  TIME  FILTERS:  PART  I 


D.  MINOO-HAMEDAN I  and  C.L.  WISE 
Department  of  Electrical  Engineering 
University  of  Texas  at  Austin 
Austin,  Texas  78712 

and 


N.C.  GALLAGHER  and  T.E.  McCANNON 
School  of  Electrical  Engineering 
Purdue  University 
West  Lafayette,  Indiana  49707 

ABSTRACT 


The  problem  of  minimum  mean  squared  error  prediction  of  a  discrete  time 
random  process  using  a  nonlinear  filter  consisting  of  a  aero  memory  non¬ 
linearity  followed  by  a  linear  filter  is  studied.  Classes  of  random  proces¬ 
ses  for  which  the  best  predictor  is  realizable  using  a  nonlinear  filter  of 
the  above  form  are  discussed.  For  those  random  processes  for  which  the 
best  predictor  is  not  realizable  using  the  above  nonlinear  filter,  an  iter¬ 
ative  procedure  is  presented  for  finding  a  suboptimal  nonlinear  filter. 

I .  INTRODUCTION 


In  this  paper  we  consider  a  second  order  random  process  {X^,  n-1,2,...}, 
and  we  are  interested  in  predicting  the  random  variable  X^+^  from  an  obser- 

,  and  we  wish  to  choose 

it  so  as  to  minimize  the  mean  squared  error. 

It  is  well  known  [l,  pp. 77-78]  that  the  optimal  estimate  of  XN+1  in 

terms  of  X1,...,XfJ  is  given  by  the  conditional  expectation 

XN+1  *  E  {XN+1  I  . Xl*  ' 

In  general,  this  is  a  Borel  measurable  function  of  X^,...,X^,  and  in  many 

cases  an  exact  expression  for  this  quantity  is  difficult  to  obtain.  Often 
we  do  not  have  the  necessary  statistical  information  to  evaluate  such  a 
quantity.  Linear  estimation  has  been  widely  studied  [2],  and  it  is  well 
known  that  the  best  linear  estimate  of  X^+^  given  the  observations  X^,..., 

X^  is  obtained  by  applying  the  Projection  Theorem  [l,  pp. 150-155].  It  is 

clear  that  in  this  case  the  only  statistical  information  required  is  the 
second  moment  characteristics  of  the  random  process. 

In  this  paper  we  restrict  our  estimate  X.,..  to  be  of  a  form  that  is 

N+i 

expressible  as  the  output  of  a  system  consisting  of  a  zero  memory  nonline¬ 
arity  (ZNL)  followed  by  a  linear  filter.  The  ZNL  is  characterized  by  a 
Borel  measurable  function  g(’)  such  that  g(X^) , . . . ,g(Xjj+^)  are  second  order 

random  variables.  If  the  weighting  sequence  of  the  linear  filter  is  given 
by  hQ,...,hN_^,  then  the  estimate  is  given  by 

^N+l  "  £  8(xn)hN-n  *  (1) 

n=l 

Presented  at  the  Sixteenth  Annual  Allerton  Conference  on  Communication , 
Control ,  and  Computing ,  October  4-6 ,  1978;  to  be  published  in  the 
Proceedings  of  the  Conference. 


vat  ion  of  X^,...,X^.  Our  estimate  is  denoted  by  X^( ^ 


-2- 


We  wish  to  determine  a  function  g(>)  and  a  set  of  coefficients  hQ,...,hN_^ 

in  such  a  way  that  the  resulting  mean  squared  error  is  minimized.  With  this 
form  of  an  estimate,  we  are  guaranteed  that  the  performance  can  be  at  least 
as  good  as  that  of  the  optimal  linear  filter. 

In  Section  II  we  consider  some  cases  where  the  optimal  estimate  has 
the  form  of  Eq.(l).  In  the  general  case  the  optimal  predictor  will  not 
have  the  form  of  Eq.(l)  and  thus  a  predictor  of  this  form  will  be  suboptimal. 
This  situation  is  discussed  in  Section  III  where  an  iterative  scheme  is 
presented  for  determining  suboptimal  predictors.  In  Section  IV  examples 
are  given  to  illustrate  the  method. 

II.  OPTIMAL  PREDICTION 


In  this  section  we  consider  some  cases  where  the  optimal  filter  has 

the  form  of  Eq.(l).  Whenever  the  optimal  filter  is  linear,  then  it  obviously 

has  the  form  of  Eq.(l)  with  g(x)=x.  The  class  of  spherically  invariant 

random  processes  [3]  admits  linear  solutions,  with  the  most  well-known 

examples  being  the  Gaussian  processes. 

It  is  clear  that  the  performance  of  the  filter  given  by  Eq.(l)  can 

always  be  made  at  least  as  good  as  that  of  the  optimal  linear  filter.  In 

some  cases  the  filter  given  by  Eq.(l)  can  be  optimal  while  the  optimal 

linear  filter  is  useless.  For  example,  let  X  =P  (U)  where  U  is  a  random 

n  n 

variable  uniformly  distributed  over  [-l,l]  and  pn(')  is  the  n-th  Legendre 
polynomial.  In  this  case,  the  sequence  {X^,  n=l,2,...}  is  a  sequence  of 

uncorrelated  zero  mean  random  variables  and  the  optimal  linear  filter  yields 

an  estimate  which  is  zero.  However,  for  g(x)=P„  (x)  and 

N+l 


1,  n= 
0,  nj*0 


the  filter  of  Eq.(l)  gives  the  estimate  XN+1=XN+1*  Numerous  examples 

similar  to  this  can  easily  be  constructed. 

When  the  process  is  a  (first  order)  Markov  process  it  is  well  known 
[l»  pp.  81-83]  that  K{XN+1  [XN> . . .  .X^}  =  EfXj^-Jx^},  with  probability  one 

(wpl) .  Thus  a  system  of  the  form  of  Eq.(l)  with  a  ZNL  given  by  g(x)= 
E{XN+llXN  =  an<*  a  we*8hting  sequence  given  by 


h 

n 


1 ,  n=0 

0,  n?l  0 


will  yield  the  optimal  estimate  of  X^+^. 

Markov  processes  serve  as  the  model  of  many  physical  phenomena  that 
arise  in  practice.  Often  they  are  obtained  as  the  solution  of  first  order 
stochastic  difference  equations  of  the  form 


Xn+1  ~  +  Zn+1  ’ 

where  g(-)  is  a  Borel  measurable  function  and  the  sequence  {Z^}  is  a 

sequence  of  zero  mean  independent  random  variables  independent  of  the 
Initial  condition  Xq.  It  is  easily  seen  that  in  this  case  we  will  have 

ElXN+liXN*--"Xl!=B(V  WP‘- 

It  is  clear  that  for  any  random  process  for  which 


N 

E{XN+llV—Xl}  =  £  8(Xn)hN-n  Wpl*  (2) 

n=l 


a  system  of  the  form  of  Eq.(l)  will  produce  the  optimal  estimate  of  X^^. 


As  another  example  of  a  process  for  which  the  conditional  expectation  has 
the  form  of  Eq. (2)  consider  the  process  generated  by  the  following  second 
order  stochastic  difference  equation: 


Xn+2  "  V(Xn+l>  +  hl“(Xn>  +  Zn+2 


n— 1,0, 1,2,.. 


(3) 


where  g(-)  is  a  Borel  measurable  function  and  {Z^}  is  a  sequence  of  zero 

mean  independent  random  variables  independent  of  the  initial  conditions 
X_j  and  XQ.  It  can  be  easily  seen  that  for  this  example,  for  any  N>2, 

wWS . VW’V'W  "p1- 

Extension  of  this  example  to  the  case  where  Eq.(3)  is  a  k-th  order  stochas¬ 
tic  difference  equation  is  obvious. 

To  obtain  a  characterization  of  a  random  process  for  which  a  form  of 
Eq.(2)  holds,  we  use  a  theorem  due  to  Balakrishnan  [4]. 

Theorem  (Balakr  ishnan) :  Let  cN+^i  ^  •  »CN+1)  denote  the  joint  character¬ 

istic  function  of  the  random  variables  Xj,...,XN+^.  Assume  that  the  moments 
of  all  orders  of  the  random  variables  exist,  so  that  | ^ ( . . . )  has  deriva¬ 
tives  of  all  orders.  Let  I).  denote  the  differential  operator  3 ( -  ) / 3 it.  , 
so  that 


°kCN+l(tr 


^N+l5  ait,  CN+1V"1 


(t’ . tN+l)  * 


Let  P(x^,...,x^)  be  a  polynomial  In  N  variables.  Then  a  necessary  and 
sufficient  condition  for 


Et (XN+1)M| XN»  *  *  *  »X1 1  ■  P(X1,...,XN)  wpl 


is  that 


„M 

777  Tm  CN+l<tl,'***tN+l)lt  -0  “  P(°l . °N)  *  CN+l(tl . tN,°)‘ 

1  N+l'  N  1 


Now,  in  the  above  theorem  let  M“1  and  let  g(-)  be  a  polynomial  of 
degree  d,  i.e. 


g(x) 


d 


£ 

j-0 


V 


(4) 


and  assume  P(x,,...,x„)  has  the  form 
1  N 


-4- 


N  N  d 

>(x  ...,x)  =  Y,  hN-nK<Xn)  =  ?  'Vn^*^ 

n=l  n=l  j=0 


Assume  that  the  random  variables  in  the  process  possess  moments  of  all 
orders.  Then  a  necessary  and  sufficient  condition  for  Eq.(2)  to  hold, 
where  g(*)  is  given  by  Eq.(4),  is  that 


3(ltN+l> 


CN+l(tl’ ‘ 


*  CN+1 ^ 


N+1=0 


£  £  hN-n3j DnCN+l^Cl . tN,0) 

n=l  3=0  J 


This  result  is  of  Limited  practical  usefulness,  because  one  often  does  not 
have  the  necessary  statistical  information  available. 


III.  SUB0PT1MAL  PREDICTION 

In  the  general  case  there  will  not  exist  a  function  g(-)  and  a 
weighting  sequence  h-,...,h  such  that  Eq.(2)  is  satisfied.  However,  it 

is  quite  reasonable  to  conjecture  that  in  many  cases  it  may  be  possible  to 
determine  a  filter  having  the  form  of  Eq.(l)  with  a  mean  squared  error 
either  significant Ly  smaller  than  that  associated  with  the  optimal  linear 
filter  or  very  close  to  the  mean  squared  error  associated  with  the  optimal 
filter . 

Once  we  assume  that  the  function  g(-)  that  minimizes  the  mean  squared 

error  is  known,  the  g(X  )'s  will  be  well  defined  random  variables  and  the 

n 

determination  of  the  h^'s  that  minimize  the  mean  squared  error  reduces  to 
an  application  of  the  Projection  Theorem,  i.e.  setting 


and  solving  for  the  h  's.  To  carry  out  this  step  we  need  to  calculate  the 
terms  Ef g(Xn>g(X^ ) }  and  E{ X^+1g(X^) ) .  The  difficult  problem  is  the  deter¬ 
mination  of  the  function  g(-)  that  minimizes  the  mean  squared  error. 

Notice  that,  in  the  optimization  problem  where  the  filter  is  constrained 
to  be  of  the  form  in  Eq.(l),  only  second  order  information  (i.e.  the  family 
of  bivariate  distributions)  is  required.  This  is  more  statistical  infor¬ 
mation  than  is  required  if  we  were  doing  optimal  linear  filtering,  which 
requires  second  moment  information.  However,  it  is  still  considerably 
less  statistical  information  than  is  required  if  we  were  doing  optimal 
filtering,  which  requires  statistical  information  pertaining  to  an  (N+l)-st 
dimensional  distribution. 

In  order  to  circumvent  the  difficult  problem  of  determining  the 
function  g(*)  to  use  in  Eq.(l),  we  will  parameterize  g( • )  and  thus  let  the 
determination  of  g(-)  simply  depend  upon  finding  the  correct  parameters. 

Doing  so,  we  would  then  write  the  resulting  mean  squared  error  as  a  function 
of  the  parameters  associated  with  g(-)  and  the  weighting  sequence  of  the 
linear  filter.  In  this  case,  the  mean  squared  error  would  be  a  function  of 
K+N  parameters,  where  K  is  the  number  of  parameters  associated  with  g(-). 

For  example,  let  g(-)  be  given  by 

K 

g(x)  =  Y  «sbi(x)  ■ 
j  =  l  J  J 


-5- 


Then  our  estimate  is  given  by 


N  K 


Vl  ‘  2  £  Vn^W 

n=l  j=l  J  J 


and  the  resulting  mean  squared  error  is  given  by 
(  i  )  4r  i  ol  N  K 


E  j  [XN+l"XN+l]  |  '  EJ[XN+l]  |-  2  E  ^hN-najEiXN+lbj(Xn^  <5) 

+  £  £  &  IX-A^jV^VVVVl  • 

n=l  m=l  .1=1  k=l  J  J 

The  functions  b  ^  (  •  )  should  be  determined  so  that  there  is  considerable 

flexibility  in  the  functional  form  of  g(-)  and  also  so  that  the  expectations 
in  Eq.(5)  could  be  determined  from  the  statistical  information  at  hand.  For 

example,  if  b^(x)=xJ,  then  the  necessary  statistical  information  would  con¬ 
sist  of  the  higher  order  joint  moments. 

The  next  step  might  be  to  minimize  Eq. (5)  over  the  N+K  parameters. 

This  would  result  in  N+K  equations  of  third  order  polynomials  in  the  param¬ 
eters.  This  simultaneous  optimization  over  all  the  parameters  presents 
potential  numerical  problems.  As  an  alternative  to  the  simultaneous  opti¬ 
mization  over  all  the  parameters,  we  will  now  describe  an  iterative  tech¬ 
nique. 

The  basic  plan  of  the  iterative  technique  is  to  consider  the  two  sets 
of  parameters  separately  and  to  iteratively  optimize  over  one  set  of  param¬ 
eters  while  holding  the  other  set  fixed.  This  iterative  technique  results 
in  the  need  to  solve  systems  of  linear  equations,  as  opposed  to  the  need  to 
solve  systems  of  equations  in  third  order  polynomials  such  as  encountered 
in  the  effort  to  simultaneously  optimize  over  all  the  parameters. 

We  will  assume  that  the  parametric  form  of  g(*)  is  such  that  with  the 
proper  choice  of  parameters  we  could  have  g(x)=x.  In  this  way  the  mean 
squared  error  that  results  will  always  be  upper  bounded  by  the  mean  squared 
error  associated  with  the  optimal  linear  filter. 

The  iterative  technique  is  as  follows: 

Step  1.  Determine  the  optimal  weighting  sequence 
hg,...,hN_^  for  the  case  where  g(x)*x. 

Step  2.  Evaluate  the  resulting  mean  squared  error. 

Step  3.  For  this  choice  of  h_,...,h„  ,,  determine 

U  N-l 

so  as  to  minimize  the  mean 
squared  error. 

Step  4.  For  this  choice  of  ,  determine 

the  optimal  weighting  sequence  hg,...,hN 

Step  5.  Repeat  Steps  3  and  4  until  the  improvement 
in  the  mean  squared  error  is  negligible. 

The  a^,...,a^  and  hg,...,h  that  are  obtained  in  Step  5  after  the 

termination  of  the  iterations  determine  the  system.  Step  1  and  Step  4  make 
use  of  the  Projection  Theorem  and  result  in  E{XN+^g(X^)}  - 


-o- 


y*  h  k!k(X  )g(x.)},  j  =  step  2  makes  use  of  Eq.(5).  Step  3  also 

“  w-n  n  j 

n=l 

makes  use  of  Eq.(5)  and  results  in 

N  N  r  K  1 

Y.  T.K  hv  2a.E{b.(X  )b.(X  ))  +  a.  E{h  <X  >h  (X  ) } 

[£l  N-n  N_m  J  J  n  j  m  ^  k  j  n  k  m  j 

N  k^j 


•  3-1 . *• 

n=l 


IV.  EXAMPLES 


In  this  section  we  consider  a  particular  parametric  form  for  the  ZNL 
and  a  specific  model  for  the  random  sequence.  The  iterative  method  described 
earlier  is  used  in  this  case  to  determine  a  filter  of  the  form  of  Eq.(l). 

We  also  determine  the  mean  squared  error  resulting  from  use  of  the  optimal 
filter  and  that  resulting  from  use  of  the  optimal  linear  filter.  Perfor¬ 
mances  of  the  filters  are  compared  and  it  is  seen  that  in  several  instances 
the  improvement  in  mean  squared  error  of  the  subopt imal  filter  over  that  of 
the  opt  final  linear  filter  is  a  significant  fraction  of  the  corresponding 
improvement  of  the  optimal  filter  over  that  of  the  optimal  linear  filter. 

Assume  that  we  have  knowledge  of  the  regression  function 


'<*>  ‘  ""WlV*1 


Notice  that  if  we  choose  g(x)=r(x)  and 


h  .  |l.  n=0 

n  lO,  n^O  , 


then  the  estimate  would  be  the  same  as  that  of  the  optimal  filter  based  on 
the  most  recent  observation.  If  wo  were  to  use  the  Projection  Theorem  to 
choose  a  different  weighting  sequence  Ih  },  we  might  do  better.  It  seems 
reasonable  to  expect  that  if  we  were  to  parameterize  g(*)  in  such  a  way  that 
by  proper  choice  of  the  parameters  we  would  have  g(x)=r(x),  and  then  use 
this  parameterization  of  the  ZNL  in  the  iterative  technique  described 
earlier,  we  might  determine  a  system  of  the  torm  of  Eq.(l)  exhibiting  very 
good  performance.  This  is  how  we  wil l  choose  the  ZNL  in  this  section. 

As  a  model  for  the  random  sequence  {X  ,n=l,2,...}  we  will  assume  that 


■  W! 


where  { Z^ ,n=l ,2 , . . . }  is  a  zero  mean  stationary  Gaussian  process  with  unit 

variance  and  autocorrelation  function  p(’)- 

First  we  will  derive  an  expression  for  the  regression  function  of 
Eq.(6)  when  the  random  sequence  is  given  by  Eq.(7).  Using  results  in  [5J, 
we  have  that 

E,xN+iiy  •  E{<vi)2k+'iz„} 

,  Eu(i>r  b„o  (zK> 

n=0 


■  ECo(i)ln  b„on(  <y 


1/ (2k+l) 


-7- 


re  ■ 


where  the  series  are  mean  square  convergent,  the  constants  {bnJ  are  given  by 


b  = 

n  /2 


f  ,  s 2k+l  „  .  , 

—  J  (x)  0n(x)exp 

71 


(4) 


dx  , 


(8> 


and  0^  is  the  n-th  normalized  Hermite  polynomial  given  by 


(-l)n  (x2  \  d"  ( -x2\ 

n  ^T'eXP\2~/7n  eXP\  —  j 


dx 


We  see  from  Eq. (8)  that  b  =0  for  n-2k+l  and,  in  fact,  the  b  's  can  be 

n  n 


obtained  from  the  relation 


2k+l 

(x)2k+1  =  £  b  0  (x)  . 

n=0  n  " 


For  example,  for  k=l. 


b  = 
n 


3 ,  n=l 
/6,  n=3 
0,  n^l  ,3 


ml  r  (x)  is  given  hy  r(x)  =  [i>  ( I )  3  '*x  +  3p(l)  ^l-[p(l)]2^ 


1/3 


For  k=2. 


b  = 
n 


15,  n=l 
10/6,  n=3 
2/30,  n=5 
0,  n^l, 3,5 


and 


2 

(x)  =  [p(l)]5x  +  I0[t,(1)]3  (l-[p(l)]2)  x3/5  +  15p(l)  (  l-[p(l)]2)  x1/5. 


In  general,  for  an  arbitrary  positive  integer  k,  it  is  easily  seen  that  r(*) 
has  the  form 


r(x)  -  ck+1x  +  ck(x)(2k-1)/<2k+l>  +  ck_1<x)<2k-»'<2k+l> 


+  ....  +  Cj  (x) 


1/ (2k+l) 


where  the  c^’s  are  constants  that  can  be  determined  using  the  above  procedure. 
Thus  we  choose  the  ZNL  g(-)  to  be 


k+1 

;(x)  =  £ 


( 2 i— 1 ) / (2k+l) 


i=l 


where  the  parameters  a  ^  are  to  be  determined  by  the  iterative  procedure.  In 

utilizing  the  iterative  procedure  we  encounter  the  need  for  the  knowledge  of 
moments  and  joint  moments  of  fz  l  (see  [6]),  which  are  given  by 


4v1  ■  | ’T5  [" 


p  even 
p  odd 


-o- 


( r+s-1 )  p  (i)p(r-l ,s-l,i)+(r-l)  (s-1) • 

(l-[p(i)]2j  p(r-2,s-2,i)  when 
(r+s)  is  even 

0  when  (r+s)  is  odd 

2 

Observing  that  u (1, 1 , i)=p ( i )  and  u (2 , 2 , i)=l+2[p ( i) ]  ,  all  higher  order 

joint  moments  can  be  calculated  using  Eq.(9). 

In  order  to  compare  the  performance  of  the  suboptimal  estimator  with 
that  of  the  optimal  estimator,  we  have  obtained  expressions  for  the  mean 
squared  error  associated  with  the  optimal  estimator.  For  the  optimal  system 
we  are  interested  in 


.)2k+1!ZN 


Notice  that  this  is  the  (2k+l)-st  conditional  moment  and  the  conditional 

distribution  has  the  functional  form  of  a  Gaussian  distribution.  Thus  the 

minimum  mean  squared  error  follows  using  standard  properties  of  the  Gaussian 

distribution  (see,  for  example,  [7]).  For  k=l  we  find  that  the  minimum 

a  f  2  4  2  6 

mean  squared  error  is  given  by  15  —  P ^ 1 9E{ Y  )  +  6P^E{Y  }  +  E{Y  } 

and  for  k=2,  the  minimum  mean  squared  error  is  given  by 

945  -  P8  ^225  EiY2!  +  300P]KiY4t  +  130P2E{Y6}  +  20P^E{Y8}  +  P4E{Y10}]  . 

In  these  expressions  P  is  a  constant  and  Y  is  a  normal  random  variable  with 

1  2  2 

zero  mean  and  variance  y  .  The  constants  and  y  are  defined  as  follows. 
Assume  without  loss  ol  generality  that  the  correlation  matrix  R  associated 
with  Zj,...,Z^+j  is  positive  definite  (if  it  is  not,  the  data  can  be  re¬ 
duced  to  achieve  this  result).  Then  P  is  the  reciprocal  of  the  element  in 

-1  1 

the  lower  right  corner  of  R  .  Denote  the  first  N  elements  in  the  last  row 
of  R  1  as  r^,...,r^.  Then 


2 

Y  = 


N 

£  <-)2  + 


N-l  m 

2  £  £  rN-n+lrm-n+lp(N-m) 

m=l  n=l 


The  mean  squared  error  associated  with  the  optimal  linear  filter  can 
be  obtained  in  a  straightforward  fashion. 

In  the  following  tables  results  are  presented  comparing  the  suboptimal 
filter  to  the  optimal  filter  and  the  optimal  linear  filter.  Several 
correlation  sequences  for  {Z  }  are  considered,  both  the  third  power  and  the 
fifth  power  of  Z^  are  used  as  models,  and  examples  for  two  observations 

and  five  observations  are  given.  In  these  tables  L. ,  L,  and  L  .  are  the 

1  min 

mean  squared  errors  resulting  from  the  optimal  linear  filter,  suboptimal 
filter  using  a  ZNL,  and  the  optimal  filter,  respectively.  The  quantity 
n 1  is  the  percent  of  decrease  in  when  the  suboptimal  filter  using  a  ZN! 

is  employed,  i.e.  *  100(1. ^-I,)/L  .  The  quantity  Is  the  percent  of 

possible  improvement  in  L  using  the  optimal  filter,  i.e.  n„  =  100(1.-1.  .  )/\.  . 

I  2  1  in  l  n  i 

The  quantity  is  the  normalized  percent  of  improvement  over  the  linear 

filter  given  by  the  suboptimal  filter  using  a  ZNI,,  i.e.  r,^  =  100  r^/n.,  = 

100(L.  -I.)  /  (L,  -L  .  ). 

I  1  min 


Table  1.  Correlat 


5 

.425 

.2)75 

.14675 

.09448 

6 

.8133 

.  6b66 

.5 

.  3)33 

7 

.5787 

.296) 

.125 

.037 

8 

.4822 

.19  7  5 

.0625 

.012  3 

.2) /5  .14675  . 09448  .0620/ 

.  6b66  .5  .  33  3 3  .1666 

.296)  .125  .037  .00463 

.1975  .0625  .0123  .00077 

ion  sequences  corresponding  to  Tables  2-5. 


h 

1. 

L  . 
min 

"l 

n2 

n3 

1 

9.198  3 

8.8614 

8.8581 

3.6 

3.69 

97.3 

2 

5.1744 

5.0622 

5.0599 

2.16 

2.21 

97.6 

3 

12.5987 

12.1 084 

12.108 

3.89 

3.89 

99.8 

4 

12.3196 

11.9216 

1 1.8952 

3.23 

3.44 

93.7 

5 

13.6849 

1  ) .  295  7 

13.293 

2.84 

2.86 

99.1 

6 

6.9247 

6.6228 

6.4926 

4.36 

6.23 

69.8 

7 

12.2903 

1 1 . 732 

11.7259 

4.54 

4.59 

98.8 

8 

1 3. 3219 

12.8142 

12.8123 

3.81 

3.82 

99.6 

Table  2.  Mean  squared  errors  and  percentages  of  improvement  tor  k 


L1 

1 

\s  . 

nun 

"l 

n2 

n3 

1 

727.42 

704.58 

704.22 

3.13 

3.18 

98.1 

2 

453.78 

444.78 

444.49 

1 .98 

2.04 

96.7 

) 

887.49 

859.95 

859.9 

3.1 

3.1 

99.7 

4 

879.44 

854.59 

851 .86 

2.82 

3.13 

89.8 

5 

920.93 

899.7 

899.43 

2.3 

2.33 

98.5 

6 

584.57 

564 . 58 

550.99 

3.41 

5.74 

59.3 

7 

876.33 

845.86 

845.24 

3.47 

3.54 

97.7 

8 

910.86 

884.62 

884.42 

2.88 

2.9 

99.2 

Table  3.  Mean  squared  errors  and  percentages  of  Improvement  for  k 


.6115 

.0127 

.008 

.0059 

.0094 

1.519 

.681 1 

.  7899 

.  0084 

.0064 

.0051 

.0132 

.674 

.862 

.4026 

.0093 

.0049 

.0029 

.0024 

2.7896 

.4114 

.  3654 

.068  7 

.04  3  5 

.0297 

.028 

2.4749 

.4  334 

.2827 

.0407 

.0164 

.0076 

.0047 

3.3779 

.2656 

.776 

-.0234 

-.0175 

-.0111 

-.0662 

1 .2015 

.  7635 

.4476 

-.028 

-.018 

-.01 

-.0015 

2.775 

.4)75 

.  3505 

-.0247 

-.0121 

-.0032 

-.0017 

3.342 

.3215 

Table  4.  The  coefficients  a(  of  the  nonlinearity  g(x)  3  a^x  +  .i  ^  ^x~ 
the  h.'s  ol  the  suboptima l  system  lor  k  ■  1. 


-1U- 


ho 

hl 

h2 

h3 

h4 

al 

a2 

a3 

1 

.4779 

.0119 

.0063 

.0042 

.0052 

4.0527 

3.7727 

.493 

2 

.7065 

.0097 

.0067 

.005 

.0093 

.7136 

2.032 

.7585 

3 

.2563 

.0059 

.0028 

.0017 

.0014 

15.173 

4.5019 

.196 

4 

.2466 

.0472 

.0282 

.019 

.017 

11.733 

4.465 

.1966 

5 

.162 

.0227 

.009 

.004  3 

.0026 

23.858 

3.802 

.0839 

6 

.6534 

-.034 

-.0234 

-.0136 

-.024 

2.742 

2.9562 

.6302 

7 

.2864 

-.0184 

-.0096 

-.0054 

.0002 

14.7841 

4.5769 

.2267 

8 

.2032 

-.0139 

-.0065 

-.0019 

.0008 

22.373 

4.2663 

.128 

Table  5.  The  coefficients  of  the  ZNL  g(x)  =  a.^x  +  a^x  +  a^x  and 
the  h.'s  of  the  suboptimal  system  for  k  =  2. 


P(l)  p(2) 

h0 

hl 

al 

a2 

1 

9  .7 

1 

1.2377 

-.4974 

.9333 

.82983 

2 

8  .5 

2 

.8837 

-.3001 

1.6639 

.6923 

3 

• 

8  .3 

3 

1.095 

-.6467 

2.3987 

.6089 

4 

• 

7  .1 

4 

.7927 

-.4786 

3.2982 

.4545 

Table  6 

Correlation 

Table  7.  ' 

The  coefficients  a. 

1/3  1 

+  a.x  and  the  h. 

1  l 

system  for  k  *  1. 

r'l  n2 

of  the  ZNL 

sequences 
to  Tabl 

! 

corresponding 
es  7-8. 

L,  1. 

g(x)  =  a.2x 

suboptimal 

1.  . 
mm 

's  of  the 

n3 

1 

3.7487  3.494 

3 

.  1  354 

6.79 

16.3 

41.65 

2 

7.566  7.02/3 

6 

.7406 

7.12 

10.9 

65.32 

3 

5.7804  4.371 

1 

.0231 

24.38 

82.3 

29.62 

4 

8.9825  7.1689 

4 

.9674 

20.19 

44.7 

45.16 

Table  8 

. 

Mean  squared  errors 

and 

percentages  of 

improvement 

for  k  =  1 

ACKNOWLEDGEMENT 


D.  Minoo-Hamedani  was  supported  by  the  Air  Force  Office  of  Scientific 

Research  under  Grant  AFOSK-76-3062  and  by  the  Department  of  Defense  Joint 

Services  Electronics  Program  under  Contract  F49620-77-C-0101 .  G.  L.  Wise 

was  supported  by  the  Air  Force  Office  of  Scientific  Research  under  Grant 

AFOSR-76-3062 .  N.  C.  Gallagher  and  T.  E.  McCannon  were  supported  by  the 

Air  Force  Office  of  Scientific  Research  under  Grant  AFOSR-78-3605. 

REFERENCES 

1.  J.L.  Doob,  Stochast ir  Processes ,  John  Wiley,  New  York,  1953. 

2.  T.  Kailath,  ed . ,  Linear  Least-Squares  Estimation,  Dowden,  Hutchinson, 
and  Ross,  Stroudsburg,  Pennsylvania,  1977. 

3.  l.F.  Blake  and  J.B.  Thomas,  "On  a  Class  of  Processes  Arising  in  Linear 
Estimation  Theory,"  IKF.E  Trans.  Inform.  Theory,  Vol .  IT-14,  pp. 12-16, 
January  1968. 

4.  A.V.  Balakrishnan,  "On  a  Characterization  of  Processes  for  which  Optimal 
Mean-Square  Systems  are  of  Specified  Form,"  IRE  Trans.  Inform.  Theory, 
Vol.  IT-6,  pp. 490-500,  September  I960. 

5.  G.L.  Wise  and  J.B.  Thomas,  "A  Characterization  of  Markov  Sequences," 
Journal  of  the  Franklin  institute,  Vol.  299,  pp. 269-278,  April  1975. 

6.  N.L.  Johnson  and  S.  Kotz,  Distributions  in  Statistics:  Continuous 
Multivariate  Distributions,  p.91,  John  Wiley,  New  York,  1972. 

7.  K.S.  Miller,  Multidimensional  Gaussian  Distributions,  pp. 21-22,  John 
Wiley,  New  York,  1964. 


J 


A  NOVEL  APPROACH  FOR  DESIGNING  NON-LINEAR  DISCRETE  TIME  FILTERS:  PART  II 


T.E.  MCCANNON  £  N.C.  GALLAGHER 
School  of  Electrical  Engineering 
Purdue  University 
W.  Lafayette,  IN  1*7907 

G.L.  WISE  C  0.  Ml NOO-HAMEDAN I 
Department  of  Electrical  Engineering 
University  of  Texas 
Austin,  Texas  78712 


ABSTRACT 

We  propose  two  methods  for  designing  nonlinear  discrete  time  filters. 
The  first  method  involves  an  iteration  procedure  that  for  simple  cases, 
converges  in  one  or  two  iterations.  However,  convergence  problems  in  this 
approach  for  higher  (>  I)  time  order  filters  leads  to  a  second  method  which 
Is  based  upon  an  augmented  Hilbert  subspace  on  which  the  orthogonality 
principle  can  be  easily  applied. 

I.  INTRODUCTION 

In  many  problems,  it  can  be  shown  that  a  non-linear  filter  either  out¬ 
performs  the  linear  filter  or  performs  a  function  not  possible  with  a 
linear  filter.  One  example  of  this  is  the  homomorphic  processing  of  speech 
which  utilizes  a  linear  process  followed  by  a  non-linearity  that  is  fol¬ 
lowed  by  another  linear  processor  [l]. 

This  paper  is  concerned  with  the  non-linear  prediction  problem.  We 
consider  the  system  shown  in  Fig.  1, 


LTI 
DIGITAL 
FILTER 


Fig.  1.  Non-linear  System  Under  Study. 

where  we  investigate  two  different  methods  of  design.  In  Section  1 1 ,  we 
consider  an  iterative  scheme  and  give  examples  of  its  use.  In  Section 
III,  we  develop  a  new  non-iterative  technique  motivated  by  the  poor  per¬ 
formance  of  the  iterative  scheme  found  in  several  non-trivial  examples.  It 
Is  also  worthwhi le  to  point  out  that  in  Part  I  of  this  paper,  results  are 
obtained  based  on  complete  knowledge  of  the  process  statistics,  while  in 
Part  II  we  only  assume  that  we  have  a  finite  sequence  of  samples  from  the 
random  process.  We  also  require  the  random  process  to  be  Wide  Sense 
Stationary  (WSS)  and  to  have  finite  higher  order  moments. 

II.  ITERATIVE  FILTER  DESIGN  PROCEDURE 

We  propose  the  following  iterative  procedure  for  determining  the  MMSE 
filter  coefficients  for  the  system  of  Fig.  1. 

(1)  Assume  the  non-linearity  is  not  present  and  design  the 
optimum  (Wiener)  linear  filter. 

(2)  Keeping  the  unit  pulse  response  of  the  linear  filter  con¬ 
stant,  compute  the  polynomial  coefficients  required  to 
minimize  the  mse. 

(3)  With  the  polynomial  coefficients  fixed,  redesign  the 
optimum  linear  filter  with  the  polynomial  non-linearity. 

(A)  Repeat  Steps  (2)  and  (3)  until  convergence. 

Pieiented  at  the.  Sixteenth  Annual  AUeAton  Con^eAence  on  Communication, 
Control,  and  Computing,  OciobcA.  4-6,  197$. 


Consider  the  example  where  we  have  a  second  degree  polynomial  and  a 
first  order  linear  filter  as  shown  in  Fig.  2. 


L  y 

Hi  X  1 

1  n-l 

- 1  ao  *  v » a2x  — nVn-i - 

Fig.  2.  Example  of  Non-linear  System. 

Step  1  tells  us  to  design  the  optimum  linear  filter  assuming  the  non¬ 
linearity  is  not  present.  This  linear  predictor  is  given  by 

\  ■  h!  Vi 

where  x  =  estimate  of  the  n*^  sample  of  the  r.p.  x(t) 

^  th 

x^_j  =  actual  value  of  the  (n-j)  sample  of  x(t) 

Using  the  orthogonality  principle 

El(xn  '  hl  Vl>  Vl>  *° 

we  find  the  optimum  linear  filter  to  be 

R  (1) 

h. (1) 

where  R  (j)  =  E{x  x  .}  and  we  have  made  use  of  the  fact  that  the  r.p.  is 
W.S.S.  x  n  n_J 

Step  2  tells  us  to  compute  aQ,  a.,  and  a2  to  minimize  the  mse  keeping 
hj  constant.  This  non-linear  predictor  is  given  by 

-  h1  [ao  ♦  *lVl  +  Vn-11  (2) 

Using  this  expression  in  the  mse  equation 


we  have  that 


mse  -  E{(x  -  x  )*} 
n  n 


mse  -  E{[h,(ao  +  a , xn_ ,  +  a^J.,)  -  xn]2} 


We  minimize  the  mse  with  respect  to  the  filter  coefficients  ac,  ,  and  a2 
by  taking  partial  derivatives 


Then  the  coefficients  aQ,  a]  and  ^  that  satisfy  the  following  set  of 
matrix  equations  are  computed: 


E{x  x  . } 
n  n- 1 

E{xnVl} 

E<xn} 


hl  E<Vl>  hl  E<xn-l>  hl  E<Vl>1  f"  a2 


hl  E{xn-1}  hl  E{xn-1} 


1  n-1 

h. 


Step  3  tells  us  to  compute  the  new  optimum  linear  filter  with  a©*  a|  and 
»2  constant.  The  orthogonality  principle  together  with  Eq.  (2)  gives 

E{[xn  -  h,  (a2x2.,  ♦  a,xn.,  +  a0)3  x^,)  -  0, 


and  we  find  the  new  linear  filter  to  be 


hl  " 


E{xnVi} _ 

a2  ^xn-l}  +  al  E{Vl}  +  ao  E{Vl} 


(4) 


If  we  solve  Eq.  (3)  for  a  , 
(4)  we  find 


a]  and  a2  and  substitute  these  values  into  Eq. 


-  Rx(,) 

hi  "yoT"hi 


(5) 


where  the  second  equality  follows  from  Eq.  (1).  It  is  seen  that  for  the 
non-linear  predictor  of  Fig.  2,  the  iteration  procedure  converges  in  one 
iteration  for  an  arbitrary  W.S.S.  r.p.  x(t)  with  finite  first,  second  and 
third  order  moments.  As  an  example,  consider  the  following  signal 

2 

x  ■  k.  x  ,  +  k-  +  p,  u 
n  1  n-1  2  H  n 

where  un  are  fid,  uniform  ^) .  We  can  easily  obtain  the  optimum  MMSE 

predictor  [2],  [3]  for  this  signal  by  utilizing  the  conditional  expectation 


EtxJxn-i’  Vr  • 
ki  Vi  *  k2 


(6) 


since  E(un|xn_j,  xn_2,  •••>  =  0.  If  we  let  Pj  •  .005,  kj  -  -1.74  and 

k.  «  0.87,  the  linear  filter  gives  a  mse  -  .293.  Calculating  the  required 
moments  needed  for  the  solution  of  Eq.  (3)  empirically  with  a  computer,  we 
find  the  values  of  aQ,  a^  and  a2  to  be 


aQ  -  -3. 96026 1 
a,  -  .005525 

a2  -  7.913872 

By  computer  simulation,  we  find  that  the  non-linear  system  gives  a 

mse  -  2x10  a  significant  improvement  over  that  obtained  with  the  linear 
filter  alone.  It  is  possible  to  analytically  solve  for  the  optimum  pre¬ 
dictor  coefficients;  we  find  the  values  of  aQ,  a^  and  a2  by  equating  Eqs. 
(2)  and  (6)  and  also  using  Eq.  (1).  The  values  are  found  to  be 

ao  -  -3.960017 

aj  »  0 

a2  -  7.920035 

This  result  agrees  very  well  with  the  computer  simulation. 

Next,  consider  the  example  as  shown  in  Fig.  3. 


Fig.  3.  Example  of  Non- 1  inear  System. 

We  again  apply  the  iteration  procedure  as  outlined  by  steps  (1),  (2)  and 
(3)  above.  Applying  the  signal 


(7) 


2 

x  ■  k,  x  .  +  P|  u 
n  l  n“ 1  I  n 

where  (un)  are  lid,  uniform  (",  j)  ,  we  can  easily  show  the  procedure  ter¬ 
minates  after  two  iterations.  However,  if  we  use  a  general  second  order 
potynomi a) 

a2x2  +  a,x  +  aQ 

simulation  wi th  the  signal  of  Eq.  (7)  indicates  that  convergence  is  very 
slow  unless  the  initial  choices  of  aj,  a],  aQ,  h|  and  hj  are  close  to  the 
optimum  solutions,  and  then  convergence  occurs  in  2  to  3  iterations. 
Simulation  also  shows  strong  dependence  of  the  final  solution  on  the  ini¬ 
tial  choices  of  a^,  a^ ,  aQ,  h ^  and  h^.  In  an  attempt  to  force  the  solution 

to  the  optimum  result  for  the  general  case,  we  propose  the  following 
modified  iteration  procedure. 

(1)  Set  h.  -  h2  -  •••  «  h.  -  1 

(2)  Compute  the  polynomial  coefficients  required  to  minimize 
the  mse. 

(3)  Design  the  optimum  linear  filter  using  the  polynomial  non¬ 
linear!  ty. 

(*0  Repeat  Steps  (2)  and  (3)  until  convergence. 

But  even  with  the  modified  procedure,  simulation  indicates  sluggish  con¬ 
vergence.  Fig.  4  demonstrates  this  convergence  with  plots  of  mse  versus 
number  of  iterations  for  the  polynomial  non-linearity  a2  +  aj  x  +  a  . 


Fig.  A.  Demonstration  of  Convergence.  Curve  (A)  shows  mse  vs  N 
where  the  initial  values  of  h)  and  h»2  are  the  optimum 
linear  filter  coefficients.  Curve  (B)  shows  mse  vs.  N 
for  the  initial  values  hj  ■  h2  "  1.  Curve  (C)  shows 
mse  vs  N  where  the  initial  values  of  h)  and  h£  are  the 
optimum  non-linear  coefficients. 

It  appears  that  this  method  only  works  well  for  very  simple  structures  and 
for  more  general  cases  another  type  of  design  procedure  is  required. 

III.  NON  ITERATIVE  FILTER  DESIGN  METHOD 

In  Section  1 1 ,  we  have  studied  an  iterative  procedure  for  the  design  of 
the  non-linear  predictor  in  Fig.  1.  This  system  leads  to  a  prediction 
better  than  that  obtained  from  the  linear  predictor,  although  in  many  cases 


the  Improvement  is  not  significant.  We  would,  however,  like  to  retain  the 
basic  structure  of  Fig.  1,  which  can  aiso  be  implemented  as  shown  in  Fig. 

5  and  6. 


Fig.  5.  Alternate  Structure  to  the  Non-linear  System  of  Fig.  1. 


Fig.  6.  Equivalent  Structure  to  the  Non-linear  System  of  Fig.  5. 
We  can  now  express  the  non-linear  predictor  of  Fig.  5  as 


Defining 


k 

rm 

l 

£ 

J-l 

LJ-0 

A  k 

h 

&  z 

o 

j-l 

(8] 


oj 


where  the  h  ,  are  the  constants  multiplying  the  x°  .  terms,  we  can  also 
oj  n-j 

write  Eq.  (8)  as 

k  m 


i 


x  -  h  +  l  Z  h.  „  , 

"  °  j.,  J-l  »J 


o: 


We  now  minimize  the  mse  with  respect  to  the  coefficients  h  j  j  and  hQ  where 


mse  -  E{(xn  -  xn)*} 


(1< 


by  setti  ng 


Consider  the  structure  where  m  **  h  ■  2.  Substituting  m  ■  h  ■  2  into  Eq. 
(9)  we  have  the  non-linear  predictor 

xn  "  ho  +  hllXn-l  +  h2lXn-l  +  h12Xn-2  +  h22Xn-2  ^ 

When  we  substitute  Eq.  (11)  into  Eq.  (10)  and  minimize  the  mse  by  taking 
derivatives,  we  find  the  coefficients  must  satisfy. 


'  E<v.> 


e<V!> 


E(x„-2> 


E(V2> 


E{xn-1}  E*Vl}  E{xn-1}  E{xn-lXn-2}  E{xn-lxn-2}  hll 

E<xS-l>E{xn-1}  E{Vl>  E{xn-lxn-2>  E{xn2-lxn-2>  h21 

E{xn-2}  E*Xn-lXn-2}  E{xn-lxn-2}  E{xn-2*  E{xn-2}  h12 

_E{xn-2}  E{xn-lXn-2}  E{xn-lXn-2}  E{xn-2}  E{xn-2}  _  _h22_ 

“E(xn> 

E{xnxn-l} 

-  E<xnVl>  (,2) 

E{xnxn-2} 

E(x  x2  } 
n  n-z 

mj 

It  is  seen  that  the  solution  requires  knowledge  of  the  various  moments  and 
cross  moments.  Since  the  r.p.  is  assumed  U.S.S.,  we  can  apply  well  known 
procedures  to  estimate  empirically  these  various  moments.  Now  consider  the 
example  where  the  signal  is  generated  by  use  of  the  equation 

xn  ■  Vn-1  +  k2  *  P|  V 

(Un)  are  ild,  uniform  (~,  j) .  From  Eq.  (6),  we  Enow  that  the  optimum 
predictor  is  given  by 

xn  "  klxS-l  +  k2  '  (l3) 

This  optimum  result  corresponds  to  the  solution  hQ  ■  k£,  h£^  “  kj , 
h,,  -h12  -  h22  -  0. 

It  is  easily  shown  that  this  solution  satisfies  Eq.  (12),  and  hence 
Eq.  (12)  leads  to  the  optimum  result  for  the  signal  in  Eq.  (13).  Likewise, 
for  the  general  polynomial  signal  of  the  form 

S  P  ft 

x  -  z  £  y  xp  +  p.  u  ,  (14) 

n  -  go  n-a  H  n  * 

a«*0 

Eq.  (8)  again  leads  to  the  optimum  solution.  These  results  can  also  be 
interpreted  in  an  alternate  way.  First,  define  a  Hilbert  Space  over  the 
probability  space  and  the  set  of  r.v.  x  such  that  [4],  [5] 

E{  | x } 2 }  <  » 

with  the  inner  product  defined  as 

<  x,  y  >  -  E{xy) 


We  then  generate  the  smallest  subspace  that  contains  the  elements  of  the 
form 

{xn)!’»il 

where  x,,  is  the  n1^1  sample  of  the  r.p.  x(t)  and  yj  Is  chosen  so  that 

E{(xn)’  -  W,)  »  0  . 


The  condition 


E{ | (xr) 1  -  U,|2}  <  * 


implies  all  moments  of  the  r.p.  of  interest  upto  and  including  the  (2m) * 

moment  (m  =  highest  degree  polynomial  used  in  the  predictor)  be  finite. 
Using  this  augmented  subspace,  we  then  have  a  predictor  for  xn,  denoted  by 
x  ,  given  by 

m  k 


<  ■  Z  Z  h. .(x  .  -  p.] 
n  i=0  j=,  Ij  n-j  yi 


Expanding  Eq.  (15)  and  grouping  all  the  constant  terms  together,  the 


predictor  for  x  becomes 
r  n 


m  k  . 

h  +  £  Z  h. .  x  , 
0  i-i  j-i  IJ  "-J 


which  is  identical  to  Eq.  (8).  We  can  use  the  orthogonality  principle  to 
determine  the  h..'s.  Consequently,  we  must  solve  the  following  set  of 
equations  J 

E((x"  ■  h°  ■  j,  ’’uvj’vp’  ■  °  . -  (,! 

J  p  »  1,2,... ,k 

When  m  =  h  -  2,  Eq.  (17)  leads  to  Eq.  (12).  Again  consider  the  signal 

x  -  -1 .7k  x2  ,  +  .87  +  .005  u„  , 
n  n*  •  n 

where  (un>  are  iid,  uniform  (-^,  |) .  Simulation  for  the  case  k  »  l  shows 

excellent  agreement  with  the  known  optimum  result.  However,  the  deter- 
mininant  of  the  coefficient  matrix  vanishes  for  the  case  k  «  2.  This  is 
explained  by  the  observant  ion  that  the  signal  can  also  be  represented  as 

x  ■  -1.71*  x2  .  +  .87  (a+B)  +  .005  u 
n  n- 1  n 

where  a+8  =  1.  Since 

Vi  "  -1-7*  V2  +  -87  +  *005  Vi- 


hence 


.876  ■  Bxn_j  +  1.71*  S*p_2  “  *0°5  8un>]  » 


x  ■  -\.Jk  x2  ,  +  .87a  +  Bx„  ,  +  1.71*  Bx2  ,  +  .005  u  -  .005  Bu  , 
n  n- 1  n- 1  n*4  n  n*  I 

We  note  that  there  are  an  infinite  number  of  equivalent  signal  represents* 
tlons  and  therefore  an  infinite  number  of  equivalent  predictors.  This 
leads  to  the  following  design  procedure 

(1)  Set  m.  (highest  desired  polynomial  degree) 


(2)  Set  k*l .  (k  =  number  of  past  samples  used  In  the  prediction) 

(3)  Solve  for  the  h. . . 

(I»)  Set  k**2.  Compute  the  determinant  of  the  coefficient  matrix. 

If  the  determinant  is  zero,  terminate;  otherwise  proceed  to 
step  (5). 

(5)  Solve  for  the  h.j. 

(6)  Continue  incrementing  k,  either  until  the  determinant 
vanishes  or  a  desired  value  of  k  is  reached. 

We  also  note  that  functions  other  than  polynomials  can  be  used  in  the 
predictor.  In  this  case,  the  predictor  is  of  the  form 

m  k 

x  =  £  £  h,  .  f.  (x  ,) , 

"  i-1  j=,  'J  1  n“J 

where  we  assume  the  r.v.  f. (xn_j)  possesses  the  proper  second  moment 

properties,  in  addition,  the  fj (x)  should  be  continuous  and  bounded  over 
the  range  of  arguments  to  insure  that  the  augmented  subspace  is  complete 
and  the  condition 


is  satisfied. 


E{ j f , (x) j 2)  <  « 


IV.  SUMMARY  AND  CONCLUSIONS 


In  this  paper,  we  investigate  two  methods  of  designing  non-linear  dis¬ 
crete-time  filters.  The  first  methods  makes  use  of  an  iterative  procedure, 
that  is  alternately  computing  the  linear  filter  coefficients  and  the  non¬ 
linearity  coefficients.  We  show  how  this  procedure  performs  by  applying 
it  to  several  examples.  Because  the  resulting  filter  design  is  dependent 
on  the  initial  conditions  before  iteration,  this  method  is  only  applicable 
to  certain  problems.  For  example,  this  procedure  appears  acceptable  when 
the  starting  point  of  the  iteration  is  close  to  the  optimum  design.  We 
then  present  a  second  non-iterative  procedure  that  makes  use  of  the 
orthogonality  principle  over  an  augmented  subspace.  The  performance  of  the 
resulting  design  is  tested  by  use  of  several  examples  and  is  shown  to 
provide  excellent  results.  This  method  appears  to  work  well  even  when  the 
general  form  of  the  optimum  filter  is  not  known  a  priori. 


REFERENCES 

[1]  Lawrence  R.  Rabiner  and  Bernard  Gold,  Theory  and  Application  of 
Digital  Signal  Processing.  Prentice  Hall,  Englewood  Cliffs, 

New  Jersey,  P.0C9. 

[2]  Harry  L.  Van  Trees,  Detection  Estimation  and  Modulation  Theory, 
Part  I .  John  Wiley  and  Sons,  Inc.,  New  York,  New  York,  pp.  52-7^. 

[3]  Mischa  Schwartz  and  Leonard  Shaw,  Signal  Processing:  Discrete 
Spectral  Analysis,  Detection  and  Estimation,  McGraw-HiU, 

pp.  275-31 4. 

[/»]  Athanasios  Papoulis,  Probability.  Random  Variables  and  Stochastic 
Processes.  McGraw-Hill,  pp.  385-'i2(>. 

[5]  H.  Cramer  and  M.R.  Lcadbetter,  Stationary  and  Related  Stochastic 
Processes .  John  Wiley  &  Sons,  Inc.,  New  York,  p.  96. 


ACKNOWLEDGEMENT 


T.E.  McCannon  and  N.C.  Gallagher  were  supported  by  the  Air  Force 
Office  of  Scientific  Research  under  grant  AFOSR  78-3605.  G.  L.  Wise  was 
supported  by  the  Air  Force  Office  of  Scientific  Research  under  grant 
AFOSR  76-3062.  D.  Minoo-Hamedoni  was  supported  by  the  Air  Force  Office 
of  Scientific  Research  under  grant  AFOSR  76-3062  and  by  the  Department  of 
Defense  Joint  Services  Electronics  Program  under  contract  F49620- 77“ C-0 101 . 


« 


c 


s 


d 


QUANTIZATION  OF  BIVARIATE  CIRCULARLY  SYMMETRIC  DENSITIES 

J.  A.  BUCKLEW  &  N.  C.  GALLAGHER 
School  of  Electrical  Engineering 
Purdue  Uni  vers! ty 
West  Lafayette,  IN  **7907 


ABSTRACT 

The  problem  of  quantizing  a  two  dimensional  random  variable  whose  bi¬ 
variate  density  has  circular  symmetry  Is  considered  in  detail.  Two  quan¬ 
tization  methods  are  considered,  leading  to  polar  and  rectangular  repre¬ 
sentations.  A  simple  necessary  and  sufficient  condition  is  derived  to  de¬ 
termine  which  of  these  two  quantization  schemes  is  best.  If  polar  quanti¬ 
zation  is  deemed  best,  the  question  arises  as  to  the  ratio  of  the  number  of 
phase  quantizer  levels  to  that  of  magnitude  quantizer  levels  when  the  prod¬ 
uct  of  these  numbers  is  fixed.  A  simple  expression  is  derived  for  this 
ratio  that  depends  only  upon  the  magnitude  distribution.  Several  examples 
of  common  circularly  symmetric  bivariate  densities  are  worked  out  in  de¬ 
tail  using  these  expressions. 

I.  INTRODUCTION 

Consider  a  two  dimensional  random  variable  X  whose  bivariate  density 
Is  circularly  symmetric  and  we  desire  to  represent  this  quantity  by  a 
finite  set  of  values.  One  possible  representation  of  X  leads  to  a 
Cartesian  co-ordinate  system  expression  wherein  we  individually  quantize 
the  two  rectangular  components  of  the  random  variable.  Another  common  rep¬ 
resentation  leads  to  a  polar  co-ordinate  representation  where  we  quantize 
the  magnitude  and  phase  angle  of  X.  These  two  representations  are  mainly 
7  chosen  for  their  computational  feasibility  and  ease  of  implementation. 

Other  authors  have  considered  the  general  problem  of  multidimensional  quan¬ 
tization;  Zador  [1]  derives  an  expression  for  the  minimum  error  achievable 
by  a  multidimensional  quantizer  for  an  arbitrary  density,  but  no  insight 
into  the  required  quantizer  structure  is  attained.  Chen  [2]  describes  a 
technique  whereby  one  can  use  a  recursive  computer  technique  to  solve  for  a 
"good"  quantizer,  but  the  optimality  of  the  final  solution  is  not  assured. 
By  constraining  ourselves  to  circularly  symmetric  densities  and  also  to 
either  Cartesian  or  polar  co-ordinate  systems,  it  becomes  possible  to  re¬ 
duce  the  optimal  two  dimensional  quantization  problem  to  one  dimension. 

Max  [3l  develops  necessary  conditions  for  the  optimality  of  a  one  dimen¬ 
sional  quantizer.  Panter  and  Dite  [i*] ,  give  a  formula  for  the  asymptotic 
error  to  be  expected  for  optimal  mean  square  error  quantizers  (of  suf¬ 
ficiently  smooth  input  densities). 

In  Section  II  of  this  paper  we  obtain  a  simple  criterion  by  which  to 
determine  whether  polar  format  or  rectangular  format  gives  a  smaller  mean 
square  quantization  error.  It  is  shown  for  some  very  Important  cases, 
notably  for  the  Gaussian  bivariate  density,  that  polar  format  is  asymptot¬ 
ically  superior. 

If  polar  format  is  to  be  used  and  the  product  N  ■  NeNr  Is  fixed,  where 
Nq  and  Nr  are  the  number  of  phase  and  magnitude  quantization  levels,  re¬ 
spectively,  the  question  arises  as  to  the  optimum  ratio  Ne/Nr.  We  derive 
a  simple  expression  for  this  ratio  that  depends  only  upon  the  magnitude 
density. 

In  Section  III,  we  provide  several  examples  of  common  circularly  sym¬ 
metric  densities  (e.g.  marginal  densities  are  Pearson  II,  Pearson  VII, 
sinusoidal,  and  Gaussian)  and  we  address  the  question  of  whether  the  rec¬ 
tangular  or  the  polar  format  scheme  gives  a  smaller  quantization  error. 
Pne&entcd  cut  the  Sixteenth  A nnuai  ALtenton  Convenience  on  Communication, 
Cont/iol,  and  Computing,  Octoben  4-6,  1978, 


€ 


1 1 .  DEVELOPMENT 


Consider  the  mean  square  quantization  error  Ep  of  a  polar  format  rep¬ 
resentation, 


E  - 

P 


N .  N 
6  r  c. 

Z  Z  /  J 


J-l  Cj., 


1ft  Jdl  ?  fr^r^  dr  de 


(D 


Implicit  use  has  been  made  of  the  fact  that  in  circularly  symmetric  bi¬ 
variate  densities  the  magnitude  random  variable  with  probability  density 
fr(»)  is  independent  of  the  uniformly  distributed  t-ir,ir]  phase  random 
variable.  The  b|  and  dj  are  the  output  levels  of  the  magnitude  and  phase 
quantizers  corresponding  to  input  levels  lying  in  the  intervals  (a;.],  a;] 
and  (cj-i,  cj],  respectively.  It  is  shown  in  [5]  that  the  optimal  phase 
quantizer  is  the  uniform  quantizer.  This  allows  us  to  simplify  Eq.  (1); 


N 


-r  a  s,niT 

E  «  Z  /  1  tr2+b2-2rb.  — f(r)  dr  . 

p  1-1  a  n 


(2) 


1-1 


"7 


Differentiating  with  respect  to  bj,  we  find  the  optimum  bj  is 


a, 

sin  rp  /  1  rf(r)  dr 
b, - 2 - — - 

1  IT 


/  '  f(r)  dr 
al-1 


(3) 


The  equation  given  by  Max  for  the  output  levels  br  of  an  optimal  one  dimen 
sional  magnitude  quantizer  is  found  in  [3]  to  be 


bf  " 


’I 

a 

*1-1 


/  '  rf(r)  dr 
af 


(*») 


/  Pf(r)  dr 


1-1 


where  the  optimal  input  interval  endpoints  af  (for  the  one  dimensional 
case)  satisfy  1 


br +  br+i 


(5) 


If  we  minimize  Eq.  (2)  with  respect  to  the  a,,  we  then  arrive  at  the 
necessary  condition  (for  the  two  dimensional  case] 


.  bi  +  bi+i  br  +  bf+i  . 

I  '  Tt  2  ai 

s,n  jr 


(6) 


2  {—A 


This  equation  indicates  that  the  quantizer  interval  endpoints  for  the 


1 


i? 


& 


f. 


optimum  magnitude  quantizer  in  the  two  dimensional  case  is  the  sa.ne  as  the 
quantizer  interval  endpoints  for  the  optimum  one  dimensional  quantizer. 
From  Eqs.  (3)  and  (k)  and  the  preceeding  discussion,  we  have  the  following 
relationship  between  the  output  levels  b '  and  b , : 


N 


Consequently,  Eq.  (2)  becomes 


e 


sin  rf- 


(7) 


s,n  *T  Nr  a 

Ep  -  E{r2}  -  ( — r-2-)2  fEi  (bp2  /  1  f (r)  dr  , 


(8) 


i-l 

Ne 


■i-l 


where  E{*}  is  the  statistical  expectation  operator.  In  [6]  it  is  shown 
that  the  mean  square  quantization  error  for  a  minimum  mean  square  error 
quantizer  is  simply  the  input  variance  minus  the  output  variance.  If  we 

denote  by  the  mean  square  quantization  error  produced  by  an  optimal  N 

level  quantizer  for  the  random  variable  X,  we  may  rewrite  Eq.  (8)  as 


s,n  Nfl  2  N  sIn  K 

Ep  “  ( - Err  +  (1  -  (  ;-6-)2)  E{r2}  . 


(9) 


N 


e 


% 


.N 


Our  problem  now  is  one  of  characterizing  the  quantity  E^.  Panter  and 

Oite  [4]  give  a  formula  for  the  expected  error  of  a  minimum  mean  square 
error  quantizer  with  a  large  number  of  output  levels  and  a  smooth  input 
density.  This  formula  is 

K 


.N 


x 

“f 

N 


where 


if  f(x)!/3dx]3 


KX  " 


12 


(10) 


Roe  [7]  also  derives  some  asymptotic  formula  which  were  later  used  by  Wood 
[8]  to  rederive  Eq.  (10).  Roe's  formula  depend  on  the  truncation  of  a 
Taylor  series  expansion  of  the  input  density.  Wood,  In  his  formula,  ex¬ 
plicitly  states  that  the  input  density  and  the  first  few  derivates  (up  to 
order  five  in  some  cases)  must  exist  and  be  continuous.  Panter  and  Oite, 
in  their  derivation,  require  that  as  the  Input  intervals  become  very  small, 
the  density  function  may  be  approximated  as  a  constant  over  each  interval. 
In  [l]  it  Is  shown  that  a  sufficient  condition  for  Eq.  (10)  to  hold  is  that 
f(x)  be  Riemann  integrabie  and  that  E{x^+^}  <  «  for  some  6  >  0,  in  general 
a  much  less  severe  restriction  than  continuity  or  differentiability. 

We  make  use  of  the  approximation 

-  , .  ;£ ,  (II) 

and  of  Eq.  (10)  In  order  to  reduce  Eq.  (9)  as 


1 


2  K  0  2 

En  2  (1  -  ■—■)  -4+  .  02) 

p  3^  «*r  7N* 

If  we  denote  as  N  the  total  number  of  output  levels  allowed  to  represent 
the  two  dimensional  random  variable  X,  we  have  the  following  relation, 

N  -  Nf  Nq  .  (13) 

I/O  I/O 

Since  Kr  >  0,  it  is  simple  to  show  that  Nr  ■  0(N  )  and  Nq  ■  0(N  )  by 

differentiating  Eq.  (12)  and  solving  for  the  optimal  quantities.  Making 
use  of  this  fact  and  Eq.  (13),  we  may,  assuming  sufficiently  large  N,  write 
Eq.  (10)  as 

Kr  Nfl  2  „2 

+  ‘  04) 

Ne  Ne 

o 

This  is  then  optimized  with  respect  to  N-  and  yields  the  optimal  Na  as 

Ne  "  {W)W2  N  *  05) 

This  leads  to  the  following  expression  for  the  minimal  attainable  asymptot¬ 
ic4  polar  format  error. 


*Vopt 


V  3  N  * 


Now  consider  the  problem  of  optimally  quantizing  the  random  variable  X 
In  a  rectangular  format.  The  mean  square  quantization  error,  Ex,  of  this 
representation  is  given  by 

Nx  Ny  g  e 

Ex-  *  E  /  j  /'  t(x-f  )2  *  (y-h,)2]  f  (x.y)  dxdy  ,  (17) 

'-'J.l  9j.,  J  ,<-y 

where  Nx  and  Ny  are  the  number  of  levels  in  each  of  the  respective 
orthogonal  random  variables.  The  other  notation  should  be  clear. 

Equation  (17)  may  be  written  as 

Nx  e  N 

E  -  I  /  1  (x-f.)2  f  (x)  dx  +  Z  /  j  (y-h.)2  f  (y)  dy  .  (18) 

Vi  J"  V,  J  y 

By  symmetry  arguments  (since  fx(x)  »  fv(x)),  we  may  argue  that 
1/2  .  7 

Nx  “  ^y  *  N  •  The  quantizer  that  minimizes  the  above  equation  is  simply 
the  minimum  mean  square  error  quantizer  for  each  of  the  two  components. 
Therefore,  again  using  Eq.  (10)  we  have  for  large  N. 


where 


[/"  f(x)l/3dx]3 


Comparing  Eq.  (19)  and  Eq.  (16),  we  say  that  polar  format  is  asymptot¬ 
ically  better  than  rectangular  format  if  and  only  if 


or 


<  Eii 
>  y  3  n  ’ 

. 


In  other  words,  if  the  inequality  is  satisfied  and  the  original  input 
probability  density  is  Riemann  integrable,  then  we  are  guaranteed  that 
there  exists  an  N0  such  that  for  every  N  >  NQ,  polar  format  quantization 
wilt  perform  better  than  rectangular  format  quantization. 

If  polar  quantization  is  deemed  best  for  a  particular  density,  then 
what  is  the  ratio  Nq/N,.  that  provides  the  smallest  total  error?  This 
question  is  answered  with  the  use  of  Eq.  (15);  we  find 


0  -  /SF  ^  • 

r  opt  f  r 


III.  EXAMPLES 

For  our  first  example,  we  calculate  the  relevant  parameters  for  a 
random  variable  whose  marginal  density  is  of  Pearson  Type  VII.  This  dis¬ 
tribution  is  a  generalization  of  Students-t  distribution.  The  bivariate 
dens  I ty  is 


2V(v-1)V 


f(x.y)  -i - - 

11  ( 2 ( v- 1 )  ♦  X2  +  y2) 


-  ®  <  x,  y  <  "o 


(with  v  >  1  in  order  to  assure  finite  variance)  and  the  marginal  density 
appears  as 


f(x)  - 


( v— 1 ) V  r(v+i/2) 


/Z  r(v)  ( 2 ( v- I )  +  x2)v+1/2)  * 


-  •  <  x  < 


where  r(*)  is  the  gamma  function >and  where  we  have  normalized  the  distribu¬ 
tion  so  that  f (x)  has  unit  variance.  The  magnitude  density  is  always  de¬ 
rived  by  substituting  in  r  for  *Jx2  +  y2  in  f(x,y)  and  multiplying  the  re¬ 
sult  by  2nr,  as  shown  by  a  simple  change  of  variable.  Eq.  (23)  yields 
after  some  tedious  algebra 

tB(|  ;  ^I)13 

*  12  B(|  ;  v-1)  ’  (2‘,) 

where  B(*;  •)  Is  the  beta  function.  We  perform  similar  operations  with  the 
magnitude  density  to  yield 


;^)!3  . 


/2T* 

In  Fig.  1  Kx  (solid  line)  and  H-  it  (dotted  line)  are  plotted  as  a  func¬ 
tion  of  v  for  values  from  1.1  to  21.1.  As  shown  by  this  graph,  polar  for¬ 
mat  Is  always  asymptotically  best  for  this  class  of  distributions.  An  In¬ 
teresting  point  about  this  set  of  distributions  Is  that  in  the  limit  as 
v  ■*  «  Eq.  (23)  converges  to  a  unit  variance  Gaussian  density.  Therefore, 
taking  this  limit  in  Eq.  (24)  and  making  use  of  Stirling's  approximation, 


we  have 


Kx  ■*  ^  “  2,721  *  (26) 

Wood  [8]  estimates  this  number  as  2.73  which  is  close  to  our  derived  value. 
From  Eq.  (25)  we  have  similarly 

Kr  jj  ^r^))3  -  -931  ,  (27) 

which  is  the  parameter  for  the  Rayleigh  distribution  obtained  in  the  limit. 
Using  these  two  values  in  Eq.  (20),  we  conclude  that  asymptotically  polar 
formatting  is  better  than  rectangular  formatting  for  Gaussian  bivariate 
densities.  As  a  matter  of  interest,  when  we  substitute  the  value  of  Kr 
found  in  Eq.  (27)  into  Eq.  (21),  we  find  the  optimal  ratio  Ne/Nr  to  be 
2.65$.  Pearlman  [$]  using  distortion  rate  theory  states  that  this  ratio 
should  be  >  2.596,  which  is  in  agreement  with  our  result. 

For  the  next  example,  consider  distributions  of  the  Pearson  II  class. 
The  bivariate  density  is 

fu.v)  -  rlttr,'.). vr  ?2l)-<-~1.  u<2(*i)  -  u2  ♦  y2»  ,  us) 

ir2V(v+1)V 


where  v  >  0,  and  U(*)  is  the  unit  step  function.  The  marginal  density 

1 

y— - 

,/  >  _  r('W-l)(2(v+l)  -  «2)  2  UW**))  -  x2) 

I'W  A  r(v.j) 


is 

(29) 


For  v  *>  1/2  we  find  that  f(x)  has  a  uniform  distribution.  For  v  ■  1 ,  we 
have  that  the  bivariate  density  is  uniform  over  a  circular  region  in  the 
plane.  Using  Eq.  (29),  we  find 


K 

x 


[a(|  ; 

12  B(|  ;  v+^) 


(30) 


From  the  magnitude  density  we  derive  that 

Kr-^-tB(|  .  (3D 


In  Fig.  (2)  can  be  seen  a  plot  of  K  (solid  line)  and  A  K  ir  (dotted  line) 

x  li  r 

as  a  function  of  v  for  values  from  0  to  10.  It  should  be  noted  that  Eq. 

(29)  also  converges  to  a  Gaussian  density  as  v  +  ».  It  is  a  simple  matter 

to  check  that  the  expressions  in  Eqs.  (30)  and  (31)  indeed  go  to  the  cor¬ 
rect  limits.  From  the  plot  it  can  be  seen  that  for  values  of  v  in  the  in¬ 

terval  (0.0,  A)  polar  format  is  better.  In  the  interval  (A,  3.635)  it  is 
seen  that  rectangular  is  better,  and  from  3-635  to  «  polar  again  is  better. 
It  appears  then  that  for  the  circularly  symmetric  bivariate  density  whose 
marginal  density  is  uniform, we  have  the  interesting  result  that  rectangular 
format  is  asymptotically  better  than  polar  format. 


In  the  theoretical  development  and  in  the  examples  considered  so  far, 
we  have  constrained  the  class  of  quantizers  considered  to  two  different 
types,  the  rectangular  format  and  the  polar  format.  In  general,  neither 
of  these  schemes  will  be  optimal  for  an  arbitrary  two  dimensional  random 
variable  with  a  circularly  symmetric  probability  density.  Zador  [l]  gives 
an  expression  for  the  asymptotic  mean  square  error  Ez  of  the  optimal  two 
dimensional  mean  square  error  quantizer.  This  equation  is 


where 


(32) 


E 


z 


Cz/N  . 


For  the  Pearson  VII  density  Cz  =  I*. 0307  v/(v-1),  for  the  Pearson  II  dens  ity 
Cz  ■  4.0307  v/(v+1).  Since  in  the  limit  as  v  becomes  large,  both  of  these 
classes  of  densities  converge  to  the  Gaussian,  the  smallest  error  attainable 
for  a  two  dimensional  normal  random  variable  is  approximately  4.0307/N. 

The  best  that  we  can  do  with  a  polar  format  representation  Is  4.95/N  and 
the  best  that  we  can  do  with  a  Cartesian  format  representation  is  5.442/N. 
There  is  certainly  room  for  improvement  here,  however,  the  important  thing 
to  note  is  that  the  structure  of  the  polar  format  quantizer  is  known  while 
that  of  the  theoretical  optimum  quantizer  is  not. 


In  section  II  it  was  stated  that  a  sufficient  condition  for  Eq.  (10) 
to  be  valid  is  that  the  magnitude  density  function  be  Riemann  integrable. 
For  most  density  functions  of  interest  in  modeling  physical  systems  this 
criterion  is  met.  One  group  of  densities  that  doesn't  meet  this  condition 
Is  the  set  of  atomic  densities,  i.e.,  densities  for  which  probability  mass 
Is  contained  at  a  single  point,  in  a  circularly  symmetric  bivariate  den¬ 
sity,  the  phase  must  be  uniformly  distributed  (-»,»].  The  only  quantity 
that  can  be  discrete  is  the  magnitude  distribution,  i.e.  we  may  have 
"rings"  of  probability  mass  distributed  in  the  plane.  Suppose  we  have  a 
single  "ring"  of  probability  mass,  where  the  radius  of  the  ring  Is  1,  i.e., 

F(r)  -  U(r-l)  ,  (34) 


where  F(*)  is  the  magnitude  distribution  function  and  U(-)  is  the  unit 
step  function.  The  rectangular  component  marginal  density  is  the  sinusoidal 
dens  I ty 


f(x) 


(35) 


This  density  function  is  Riemann  integrable,  hence  Eq.  (10)  and  Eq.  ( 1 9) 

are  valid.  This  implies  the  rectangular  format  error  is  0(N  *).  Now  con- 

N 

slder  the  polar  format  case.  For  Nr  ^  1 ,  Err  »  0.  This  Implies  the  polar 

"2 

format  error  for  large  N  is  0(N  ).  Clearly  then  polar  format  is 

asymptotically  better  for  this  density.  By  extending  this  argument,  we  may 
say  that  if  P(r*0)?<l,  then  for  any  bivariate  circularly  symmetric  density 
with  an  atomic  magnitude  density  with  a  finite  nurriber  of  atoms,  polar  for¬ 
mat  will  give  a  smaller  asymptotic  mean  square  quantization  error  than 
rectangular  format. 


IV.  SUMMARY 

In  this  paper  we  have  derived  a  simple  criterion  to  determine  whether 
rectangular  format  or  polar  format  gives  smaller  mean  square  error  for 
circularly  symmetric  densities.  The  optimal  ratio  of  phase  quantizer 
levels  to  magnitude  quantizer  levels  is  also  derived.  Several  examples 
Including  the  Gaussian  case  have  been  studied  in  detail. 

It  is  Interesting  to  note  that  polar  format  Is  not  always  better  than 
rectangular  format  even  for  the  case  of  densities  with  circular  symmetry. 


This  research  has  been  supported  by  the  Air  Force  Office  of  Scientific 
Research  under  grant  AFOSR  78—3605 . 


1 


5 


« 


F 


P 

i 

r 

i  , 

t  - 

i 


i 


.1 


Fig.  1.  The  solid  line  is  a  plot  of  K  as  a  function  of  V,  the  dotted 

Ft  * 

line  is  a  plot  of  if  as  a  function  of  V  for  the  Pearson  VII 
dens  I ty .  3 


Fig.  2. 


The  solid  line  is  a  plot  of  K  as  a  function  of  V,  the  dotted 

fK 

line  Is  a  plot  of  */— =—  if  as  a  function  of  V  for  the  Pearson  II 
dens  I ty .  3 


REFERENCES 


P.  Zador,  "Development  and  evaluation  of  procedures  for  quantizing 
multivariate  distributions,"  Ph.D.  dissertation,  Stanford  University, 
Stanford,  CA,  1964. 

D.  Chen,  "On  two  or  more  dimensional  optimum  quantizers,"  The  Aloha 
System  Tech.  Rep.  A71-4,  Univ.  of  Hawaii,  Honolulu,  Hawaii,  Jan.  1971 

J.  Max,  "Quantizing  for  minimum  distortion,"  IEEE  Trans.  Info.  Theory 
Vol .  IT-6,  pp.  7-12,  Jan.  i960. 

P.  F.  Panter  and  V/.  Dlte,  "Quantization  distortion  in  pulse  count 
modulation  with  nonuniform  spacing  of  levels,"  Proc.  IRE  Vol.  39, 
pp.  44-48,  Jan.  1951. 

N.  C.  Gallagher,  "Quantizing  schemes  for  the  discrete  Fourier  trans¬ 
form  of  a  random  time  series,"  IEEE  Trans.  Info.  Theory,  Vol.  IT-24, 
pp.  156-163,  Mar.  1978. 

J.  A.  Bucklew  and  N.  C.  Gallagher,  "A  Note  on  Optimum  Quantization," 
To  appear  in  IEEE  Trans,  on  Info.  Theory. 

G.  M.  Roe,  "Quantizing  for  minimum  distortion,"  IEEE  Trans.  Info. 
Theory,  Vol.  IT-10,  pp.  384-385,  Oct.  1964. 

R.  C.  Wood,  "On  optimum  quantization,"  IEEE  Trans.  Info.  Theory,  Vol. 
IT-5,  pp.  248-252,  Mar.  1969. 

W.  A.  Pearlman,  "Quantization  Error  Bounds  for  Computer  Generated 
Holograms,"  Tech.  Rep.  //65031-1  ,  Stanford  University  Information 
Systems  Laboratory,  Stanford,  CA,  August  1974. 


QUANTIZATION  IN  SPECTRAL  PHASE  CODING 


Kerry  D.  Rinas  and  Neal  C.  Gallagher,  Jr. 

School  of  Electrical  Engineering 
Purdue  University 
West  Lafayette,  Indiana  47907 


ABSTRACT 

Spectral  Phase  Coding  (SPC)  Is  a  robust  sub- 
optlmun  digital  encoding  scheme  utilizing  the 
discrete  Fourier  transform.  The  quantization  of 
the  SPC  sequence  {i|>p)  Is  examined  as  an  effective 
quantization  of  the  spectral  magnitude  and  phase. 
A  new  encoding  technique  called  Prequantized 
Spectral  Phase  Coding  (PQSPC)  is  introduced. 

PQSPC  exhibits  the  same  robust  characteristics 
as  SPC  with  a  reduction  In  HSE.  For  the  case  of 
a  double-sided  exponential  Input  density  this 
reduction  in  MSE  is  47. St. 

I.  INTRODUCTION 

Spectral  Phase  Coding  (SPC)  Is  a  robust  sub- 
optimum  technique  for  coding  a  nonstationary  or 
large  dynmnlc  range  discrete-time  series  into 
digital  form.  In  previous  work  (I],  the  per¬ 
formance  of  SPC  in  a  mean  squared  error  sense 
hes  been  evaluated.  However,  limited  Insight 
is  provided  Into  the  affects  of  the  various  SPC 
parameters  on  overall  performance.  In  this 
paper,  we  Investigate  the  affect  of  converting 
the  spectral  magnitude  and  phase  of  the  discrete 
signal  into  the  SPC  sequence  {i|ip}  before  quan¬ 
tization  and  transmission.  In  section  II,  den¬ 
sity  functions  for  the  magnitude  and  phase  er¬ 
rors  at  the  receiver  are  obtained.  These  re¬ 
sults  suggest  a  method  of  Improving  the  SPC  en¬ 
coding  algorithm.  In  section  III,  a  technique 
cal  lad  Prequantizing  Is  Introduced.  The  addi¬ 
tion  of  Prequantizing  to  SPC  offers  a  substantial 
Improvement  In  the  overall  system  performance. 


p  •  0,1, 


The  quantized  sequence  {pp}  Is  transmitted  and 
used  at  the  receiver  to  recover  the  original 
discrete  signal.  The  reconstructed  discrete 
sequence  Is  . 

•vCi  trj  <!<•  ’  * . *'•"»£ .  (J) 

This  aquation  can  be  written  In  terms  of  the 
equivalent  magnitude  and  phase  components  at  the 
receiver. 

rvCJ  <1  .'V  w 


where  8p  -  5  (Pp  ♦  *p+H> 

%  ■  l  (*P  -  W  *  (S) 

We  define  §p  and  Yp  to  be  the  effective  quantiz¬ 
ation  levels  of  Bp  and  Yp  that  result  when  (pp) 
Is  formed.  The  effective  quantization  errors  of 
•p  and  Yp  are  defined  In  Eq.  (6). 

•  -  #  -  S 

P  P  P 


Y„  '  Yo 
p  p 


The  effect  I  ve^errort  ep  and  dp  are  the  result  of 
using  8p  and  Yp  Instead  of  Bp  and  Yp  to  recon¬ 
struct  Tan)  at  the  receiver. 


It  Is  also  possible  to  reconstruct  the  dis¬ 
crete  signal  by  sending  quantized  values  of  6p 
and  Yp  directly.  We_deflne  these  quantized 
values  to  be  6p  and  Yp  and  the  resulting  quanti¬ 
zation  error  for  this  case  Is 


II.  SPECTRAL  PHASE  CODING  QUANTIZATION  ERROR 


Spectral  Phase  Coding  uses  the  discrete 
Fourier  transform  (OFT)  to  encode  a  discrete- 
time  complex-valued  random  sequence  (en}R^  for 
digital  transmission.  The  SPC  encoding  and  de¬ 
coding  algorithms  are  given  here.  A  detailed 
explanation  of  the  SPC  procedure  is  available  in 
(2].  The  spectral  magnitude  Ap  and  the  spectral 
phas,.  Bp  of  the  discrete  sequence  are  given 
below: 

,  .11-1  OFT  p. H- 1  ... 

{*n}n-0  *  {Ap  *  }p-0  * 

SPC  encodes  the  magnitude  and  phase  of  the  spec- 

trun  by  forming  the  sequence  (pp) p^  given  by 

*  •  Y 

P  P  P 

V« "  %  '  yp  •  {i) 


■V  -  YP  -  YP  •  (7> 

The  two  sets  of  errors  In  Eqs.  (6)  and  (7)  are 
compared  to  determine  the  effect  of  transmitting 
(pp)  rather  than  (#p)  and  {(L }  on  the  overall 
performance.  We  find  the  effective  quantization 
errors  cp  and  dp  can  be  written  as  deterministic 
functions  of  the  Individual  quantization  errors 
n»  and  mp.  This  Is  first  demonstrated  with  two 
simple  examples.  The  quantizer  has  N  levels 
uniformly  spaced  from  0  to  2ir  for  both  methods. 
Examplel : 

~  N  ■  4  with  output  levels  0,  j,  ,  Let 

Bp  «0.6  w  and  Yp  “0.4  it,  then  we  find  that 


-  0.5  * 


%“W 


Yp  -  0.5  * 
"  0.1  » 


Upon  quantizing  the  value  for  pD  and  0p+n,  we 

have 


P'ltie.ntid  <ut  tht  1979  Conscience  on  Iniowition  Science*  i  St/itum,  The  John  Hopkim  UnivtAiiXy , 
Ma/ich,  1979.  To  be  pubtUhtd  in  Proceeding*  oi  thit  Conference. 


♦p  -  *  v«  ■  °-°  • 

So, 

8.  -  0.5  *  Y  •  0.5  H. 

P  P 

Consequently, 

e  «  0.1  *  n  ■  0.1  ir 
P  P 

d  •  -0.1  n  m  ■■0.1  i  . 

P  P 

In  this  example,  the  effective  quantization  er¬ 
rors  are  the  same  as  the  errors  from  di  rect 
quantization. 

Example  2: 

N  ■  4,  8p  ■  0.7  if,  Yp  »  0.1  ir;  we  find  that 


8p  -  0.5  w 
<lJp  -  o.8  it 


ep  -  0.75  w 
Consequently, 

e_  “  -0.05  » 


-0.15  if 


»  0.6  IT  . 

-  0.5  IT 

-  0.25ir  . 


-  0.1  IT 


In  this  case  the  effective  quantization  errors 
have  different  values  than  the  direct  quantize* 
tion  errors.  We  note  that  the  difference  be¬ 
tween  np  and  ep  Is  t/N  and  that  the  difference 
between  mp  and  dp  Is  also  ir/N. 

A  detailed  comparison  of  the  two  sets  of 
quantization  errors  is  developed  in  the  Appendix. 
The  results  are  given  below: 

Vnp  •  *«  -  V"p  -  S  *nd  -  "p^P  -  N 

"  n  *u •  <  n  -H»  <  or  ™  <  n  -m  < 

pN  H  —  p  p  —  N  N  —  P  P  —  N 


-  ..  u  -  ..  J»  <  ■•  or  ~  <  n  •»  < 

pN  N  —  p  p  —  N  N  —  p  p  —  N 

■  Vfi-  l 1  V"p  np^p  -  ¥  •  <8> 

and 

d_*m  ,  -s  <  n  <  ?  and  -5  <  n  -m  <  ? 

PP  N—  pp  —  n  n  —  p  p  —  N 

wn  ♦£,  -T-  <  n  +m  <-5  or  ■  <  n  -«  < 

pH'  N—  pp—  N  N—  pp—  N 

n  2ir  „  ir  ir  2w 

-mp-jj.  "jj-  i  np-mp  <^  --  or  jj  <_  np+mp  <_  — .  (9) 

Examining  Eq.  (8),  we  see  that  the  effective 
phase  quantization  error  ep  is  a  function  of  both 
the  magnitude  and  phase  errors  np  and  mp.  The 
same  Is  true  for  the  effective  error  dp. 

The  distribution  functions  of  ep  and  dp  can 
be  evaluated  In  terms  of  the  joint  density  f(n,m) 
by  use  of  Eqs.  (8)  and  (9).  For  x  <  0, 

Fe(x)  -  P{np  <  x,  -jj  <  np*mp<-;,  -jj7"p-<«p<j;} 

♦  F(n  <x  +  j,5<n  +m  <^) 
p  “  N  N  —•  p  p  —  N 

♦'*Vi«*g’3i"p-%iTr>  • 

This  expression  and  a  similar  expression  for 
F(j(x)  lead  to  jjhe  results  below.  For  -jj  ^  *  <  0. 

B*X  X  “  ” 

Fp(x)  •  /  /  f(n,m)  dn  dm 

0  m-jj 


*'*  ! 

-fi-x  -n 


f  ( n ,  m)  dn  dm 


f(n,m)  dn  da 


f(n,m)  dn  4s 


x  m* 

"h  0 


f(n,m)  dn  dm 


f(n,m)  dn  dm 


f(n,m)  dn 


it.  a 

A**  * 


f(n,m)  dn  dm 


Ua  obtain  slml  lar  results  for  0  ^  x  <_  jj. 

The  general  results  given  In  Eqs.  (10)  and 
(11)  can  now  be  used  to  determine  the  densities 
of  the  SPC  error  ep  and  dp.  The  properties  of 
the  OFT  Indicate  that  for  a  large  block  size  h, 
8„  and  Yp  will  be  Independent  with  8p  uniformly 
distributed  (0,  2ir).  Therefore,  we  assiase  that 
np  and  Op  are  statistically  Independent  and  that 
np  Is  uniformly  distributed  (-ir/N,  w/M) .  The 
densities  of  the  equivalent  quantization  errors 
ep  and  dp  for  the  SPC  case  are  given  here.  For 
-  n/H  <  x  <  tt/II  , 


*»  0 


f(m)  dn  ♦  /  f(m)  dn 

-Mxi 


f(m)  ds  ♦  / 


f  (■)  d») 


frf(x)  »  (1  ♦  x)lfm(x)  ♦  f^fj  ♦  x)],  x<0 

-  (I  -  ;  x)(fm(x)  ♦  fn(x  -  jj)],  x>0.  (13) 

The  density  of  mp  Is,  In  general,  dependent  upon 
the  statistics  of  the  Input  signal.  For  a  large 
ntznber  of  quantization  levels  N,  we  can  assume 
the  density  of  mp  to  be  uniform  (-tt/N,  it/N)  .  The 
resulting  error  densities  are  shown  In  Figure  1. 
The  result  is  confirmed  by  computer  simulation, 
f (e)  f(d) 


j  0  r  r?  5  r 

II  0  N  N  0  N 

Figure  1.  Effective  Error  Densities. 


Individual  quantization  of  8p  and  Yp  would  yield 
uniformly  distributed  error  densities  (-tf/N, 
n/N)  for  the  case  described  above.  Therefore 
we  conclude  that  preparing  6p  and  Yn  for  digital 
transmission  by  using  the  sequence  repre¬ 

sents  an  Important  element  In  SPC  performance. 

Once  we  have  evaluated  the  densities  f(e) 
and  f(d)  the  calculation  of  the  mean  squared 
quantization  error  for  8p  and  Yp  Is  straight 
forward.  We  now  present  expressions  for  comput¬ 
ing  this  quantization  error  directly.  The  ex¬ 
pressions  are  obtained  by  computing  the  Fourier 
series  expansion  for  the  quantization  error  of 
iW>  and  using  the  result  with  Eqs.  (2)  and 
(5).  The  effective  errors  ep  and  dp  are 

?  “  f.nn 

e  ■  -  s  £  -  — ■ —  sin  nN8_  cos  nNY  (1 4) 

p  N  ,  n  p  p 

n“l 

and 

2  ”  r-nn 

d  »  -  -  £  -  —  cos  nNB  sin  nNY  .  05) 

P  N  n.|  "  p  p 

The  mean  squared  error  expressions  for  SPC  are 
obtained  by  assisting  8p  and  Yp  are  independent 
and  0p  is  uniformly  distributed  (0,  2ir) .  Thus 
the  effective  mean  squared  errors  are 

09 

E(e2)  ■  —r  £  -*■  (I  Efcos  2nllY  ))  ( |6) 

p  N*  n-1  nJ  p 

and 

00 

E(d2)  --V  £  -V  (I  -  E(eos  2nNY  })  .  07) 
p  N7  n-l  n^  p 

We  have  investigated  the  sequence  ($p)  In 
terms  of  the  effective  quantization  of  8p  and  Yp. 
Effective  quantization  error  densities  and  mean 
squared  error  expressions  have  been  found.  These 
results  will  be  used  In  the  following  section  to 
improve  the  SPC  performance. 

III.  PREQUANTIZED  SPECTRAL  PHASE  COOINC 

We  have  stated  at  the  outset  that  SPC  Is  a 
suboptimum  technique  for  encoding  discrete-time 
signals.  The  results  from  [1]  Indicate  that  for 
a  fixed  bit  rate  the  nunOer  of  magnitude  quantiz¬ 
ation  levels  N|,  and  the  nunber  of  phase  levels 
N2,  must  be  related  by 

N2  -  2.596  N,  (18) 

for  optlmian  performance.  In  SPC,  Yp  ranges  from 
0  to  tr/2  and  8p  ranges  front  0  to  2tt,  Thus  Yp  has 
only  one-fourtn  the  affective  quantization  levels 
of  Bp  at  the  receiver.  This  suggests  that  an  en¬ 
coding  tradeoff  which  decreases  the  HSE  on  Yp  at 
the  expense  of  Increasing  the  HSE  on  6p  could  im¬ 
prove  the  SPC  performance.  The  previous  results 
offer  a  method  of  obtaining  the  desired  tradeoff. 

The  effective  errors  ep  and  dp  have  been 
shown  to  be  functions  of  both  np  and  mp  and  thus 
they  are  functions  of  both  8p  and  Yp.  Suppose 

Bp  has  a  density  function  that  minimizes  the  HSE 
on  Yp  at  the  receiver.  By  shaping  the  density 
of  6p  to  be  that  of  8p  before  forming  the  se¬ 
quence  (4>p)  we  can  lower  the  HSE  on  Yp  at  the 


expense  of  increasing  the  HSE  on  8  .  We  deter¬ 
mine  8'  as  given  below.  Using  Eq.  (15)  we 
obtalnP 


_4_ 

N2 


t  z 

n"l  swl 


x  E{cos  nN8  cos  mMS  sin  nNY  sin  mNY  L 
P  P  P  P 

09) 

Thus  the  HSE  on  Yp  assuning  SPC  statistics  and  a 
large  number  of  quantization  levels  N  Is 

E{d2)  =  4r  l  ids-  Efcos  2nNB  ))  .  (20) 

P  NZ  n-l  n  p 

From  Eq.  (20),  we  find  that  E(d»)  Is  minimized 
for 

%  -  •  *  2  ♦  w  •  (2,) 

Applying  these  results, we  propose  the  follow¬ 
ing  coding  scheme  called  Prequentlzed  Spectral 
Phase  Coding  (PQSPC).  First  obtain  9p  and  Yp 
as  with  SPC.  Bp  Is  then  Input  Into  a  uniform 
quantizer  with  output  levels  K  rr/N  ♦  ir/2N  for 

K  ■  0,1 . 2N-1.  This  operation  Uncalled  Pre- 

quantlzlng.  The  quantizer  output  8p  is  then 
used  to  form  the  sequence  (ipp)  and  the  rest 
of  the  procedure  Is  Identical  to  SPC. 

The  techniques  acquired  In  Section  II  ere 
applied  to  determine  the  effective  error  den¬ 
sities  of  PQSPC. 

f,(x)  ”  ■Jw  •  *h  —  x  —  S  ^22) 

and 

fdU)  -  ♦  *) .  -  i  x  <  0  (2J) 

"  4  fm(*-  s’*  0  <  * iTH-  * 

The  tradeoff  accomplished  by  Prequantizing  can 
be  teen  by  comparing  the  above  densities  to 
those  evaluated  In  Figure  I.  The  HSE  and  range 
of  dp  are  both  reduced  by  a  factor  of  two  at 
the  expense  of  ep. 

The  normalized  HSE  performance  of  PQSPC  Is 
presented  in  Table  1  for  a  nunber  of  computer 
simulations.  The  optimise  unit  variance 
Gaussian  quantizer  (O.G.Q.),  the  optlmun  uni¬ 
form  unit  variance  Gaussian  quantizer  (U.G.Q.), 
and  SPC  performances  are  also  presented  In  Table 
I  for  comparison.  All  the  quantizers  have  32 
levels  and  the  block  size  for  SPC  and  PQSPC  Is 
64,  N(A)  Is  the  Gaussian  density  and  X(A)  is 
the  doi& la-stded  exponential  density.  The  Input 
densities  are  both  zero  mean  with  variance  A. 

In  terms  of  normalized  HSE,  PQSPC  offers  an 
Improvement  over  SPC  of  20.48  for  the  Gaussian 
Input  densities,  and  47.58  for  the  exponential 
densities.  A  desirable  characteristic  of  SPC 
is  Its  relative  Insensitivity  to  a  change  in 
signal  power  or  statistics.  Table  I  demonstrates 
that  PQSPC  shares  this  characteristic.  In  the 
unit  variance  Gaussian  case  where  the  optimum 
quantization  schema  1s  given,  the  HSE  of  PQSPC 
is  Just  do idite  that  of  the  optlmun  HSE.  Further, 
for  a  significant  change  In  the  Input  signal 
power  of  statistics.  PQSPC  often  out  performs 
tha{.  same  quantizer.  _  _  _ 


Tab  I*  I. 


Density 


A  comparison  of  normalized  HSE  between 
the  optlmun  unit  variance  Gaussian 
quantizer  (O.G.Q.) ,  the  optimize  uni¬ 
form  unit  variance  Gaussian  quantizer 
(U.G.Q.),  SPC ,  and  Praquantizad  SPC 
(PQSPC). 


2. AS  £-3  3.82  £-3  7.39  E-3  5.88  £-3 

6.76  E-3  1.23  E-2  7.39  E-3  5.88  E-3 

3.63  E-3  5.43  £-2  7.39  E-3  5.88  E-3 

1.81  £-2  2.65  E-2  2.78  E-2  1.46  £-2 


5.08  £-2 
1.13  E-l 


6.77  E-2 
1.40  £-1 


2.78  £-2 
2.78  E-2 


1.46  E-2 
1.46  E-2 


IV.  DISCUSSION 

We  began  with  an  Investigation  of  quantiza¬ 
tion  In  SPC.  We  have  found  error  densities  and 
HSE  equations  that  completely  characterize  the 
quantization.  The  results  of  this  investigation 
Indicate  that  additional  quantization  can  lead 
to  Improved  HSE  performance.  This  Is  an  inter¬ 
esting  concept  as  It  does  not  follow  simply  from 
Intuition.  Using  the  concept  of  additional 
quantization,  a  technique  called  Praquantizad 
Spectral  Phase  Coding  1s  Introduced.  It  Is  shown 
that  PQSPC  has  the  same  properties  as  SPC  with 
substantial ly  reduced  HSE.  Finally,  computer 
examples  Indicate  that  PQSPC  Is  often  superior 
to  fixed  quantization  for  nonstationary  or  large 
dynamic  range  signals. 

ACKNOWLEOGEHENT 

The  authors  gratefully  acknowledge  the  sup¬ 
port  of  the  Air  Force  Office  of  Scientific  Re¬ 
search  under  grant  AFOSR  78-3605. 

APPENDIX 

DERIVATION  OF  EQUIVALENT  QUANTIZATION  ERROR 
EXPRESSIONS 

All  quantization  Is  to  N  levels  uniformly 
spaced  from  0  to  2ir  with  output  levels  K  2w/N 

for  K-0,1 . N-).  Using  Eqs,  (2)  and  (7)  we 

write  iip  In  terms  of  the  direct  quantization 
levels  Bp  and  Yp. 

Ill  “S  *n  ♦  Y  *  m 
P  P  p  P  P 

I*  "  •„  ♦  h„  *  ™  ( A1 ) 

p+H  p  p  p  p 

Since  Bp  and  Yp  represent  quentized  values, 

8„  ♦  Y  »  k  ~  ,  k  an  Integer  . 

P  P  " 

Thus,  an  equivalent  way  of  expressing  lip  before 
quantlzetlon  Is 

iji-  ■  k  T7"  ♦  n_  ♦  m  (A2) 

P  N  p  p 

Note  thet  |  n  |  £  n/H,  |m  |  <  w/N  and  thus 


*  if  ~  np  *  mp  —  if  (A3) 

The  quantization  of  i|tp  Is  now  described. 

&p  ■  K  T  .  -  s  I  °p+mp  1  A 

_  „  2ir  2 it  2n  j 

"  K  1 - '  T  -  np*"p  -  N 

“  K  T  *  TP  h  -  "p^p  -  T  (a4) 

Recalling  K  2*/N  -  8p  *  Yp.  we  write  Eq.  (4)  as 


*p  "  8p  *  *p  *  ‘N  -  np**p  -  N 

-  6  +  Y  -  <  n  +m  < 

p  p  N  IT  —  p  p  —  N 

•  Bp  ♦  Yp  ♦  ^j-,  ^  <  np+iiip  i  ^f  ■  (AS) 

Similarly, 

\|i  .*  8  -  Y  ,  -?  <  n  -m  <  ■[ 

p+H  p  p  *  N  —  p  p  —  N 

2  O  2ir  2ir  it 

p  p  N  •  N  -  p  p  -  N 

■  8p  “  Yp  ♦  ^ ,  ■■  <_  np*mp  —  T  ’  (A6) 

Using  Egs.  (5)  and  (6)  we  write  ep  In  terms  of 
$p  end  <l»p+M 

•P  ■  %  *  8P  -  bp  - 1  (h  *  J  •  (A7) 

We  examine  three  examples  here  for  clarity  and 
then  state  the  general  results. 

Case  1; 


-S<V"p-*  •  -3<vbp-s 

Thus,  <i  »  8  +  Y_  ,  2  jj.  -  8  -  Y  , 

P  P  P  P“  P  P 


e.  -  8  -  5  [(8  ♦  YJ  ♦  (8  -  Y  )] 

p  p  z  p  p  p  p 


•0-0  »  n  . 

P  P  P 


Case  2: 


•  *rp  <  n  -Hti  <  ••  t  ••  <  n  “in  <  - 
N  —  p  p—  N  N  —  p  p  —*  N 

Thus,  $p.0p+Yp-f  .  $p+H-Sp-  Yp, 

and 

%  "  8P  '  1  I(5P  *%'Ti  *  (iP'  V1 


Case  3: 


9  *  8  +  j;  "  n  ♦  jj 

p  p  N  p  H 


it  .  _  2it  h  it 

n < %% it  •  -fi i vmP i n 


P  P  -  N 


Thus.  *p  “  V  V  T  •  Vh  “  V  V 

and 

*P  “  #P  *  1  [(5p  *  *p  *  T*  *  (5p  *  V1 

■  %  -  5P  -  fi  •  "p  -  2  • 

„  There  are  five  possible  pairings  of  2p  and 

v«  *'nc* 


/ 


n_  ♦  m 
P  P 


In  -  m  | 

i  p  pi 


and 


n 


P 


♦  m  | 
P 


< 


u 

N 


show  that  four  of  tha  nlna  aval  labia  pairings 
ara  not  atlowad.  Tha  complete  results  for  a_ 
are  given  In  Eq.  (A8)  .  The  results  for  dp  given 
In  Eq.  (9)  are  obtained  In  a  similar  manner. 


♦i 


-  "p-fi 


•  *  M  —  "p^p  <  N  and 

*  fi  -  np'"p  -  N 
.  -  Tinp^p<-  jj  or 

-  4r  <  « -m  <  -  S 

•  l  -  "p^p  i  T  or 

rr  _  2n 

,1  <  V"p  <  “  • 


REFERENCES 

[l]  N.C.  Gallagher,  Jr.,  "Quantization  Schemes 
of  the  OFT  of  a  Random  Time-Series,"  IEEE 
Trans,  on  Info.  Theory,  I T-24,  pp.  156-163, 
(1978). 


[2]  N.C.  Gallagher,  Jr.,  "Spectral  Phase  Cod¬ 
ing,"  Proc.  of  John  Hopkins  CISS,  April 
1976. 


tan  TRANSACTIONS  ON  INFORMATION  THBORY,  VOL.  IT-25,  NO.  3,  MAY  1979 


365 


A  Note  on  Optimal  Quantization 

JAMES  A.  BUCKLEW  and  NEAL  C.  GALLAGHER,  JR.. 
membbr,  IEEE 


ra  the  variance  at  the 
value  k  preserved  by 


I.  Introduction 


J.  Max  ( I]  it  generally  credited  with  being  the  first  to  consider 
the  problem  of  designing  a  quantizer  to  minimize  a  distortion 
measure  given  that  the  input  statistics  are  known.  Max  derives 
necessary  conditions  for  minimizing  the  mean  square  quantiza¬ 
tion  error.  These  results  are  summarized  in  the  following  equa¬ 
tions: 


y,m  P  xJ(x)dx/P(x^x<x<.Xj) 
2  > 


(1) 

(2) 


where  J(x)  is  the  probability  density  of  the  variable  to  be 
quantized  and  J*(xy_ ,  <x  <xy)  is  the  probability  that  x  lies  in  the 
interval  (x/_,,xy|.  The  yt  are  output  levels  and  the  x}  are  the 
break  points  where  an  input  value  between  xy.,  and  x/  is 
quantized  to  yr  Fleisher  [2]  later  gave  a  sufficient  condition  for 
Max’s  equations  to  be  the  optimal  set. 

Typically,  the  above  equations  are  intractable  except  for  sim¬ 
ple  input  densities,  causing  some  researchers  to  derive  approxi¬ 
mate  formulae  for  some  common  densities.  Roe  [3]  derives  an 
approximation  for  the  input  interval  endpoints  assuming  that  the 
widths  of  these  intervals  are  small,  i.e.,  the  number  of  output 
levels  is  large.  Wood  [4]  derives  a  result  which  states,  in  effect, 
that  the  variance  of  the  output  of  a  minimum  mean-square  error 
quantizer  should  be  less  than  the  input  variance.  He  also  states 
that  the  significance  of  his  result  is  that  the  signal  and  noise  are 
dependent  and  that  no  pseudo-independence  of  the  sort  consid¬ 
ered  by  Widrow  [4]  is  possible. 

However,  Wood's  derivation  assumes  the  input  density  to  be 
five  times  differentiable  and  that  the  quantizer  input  intervals  be 
very  small  in  order  to  truncate  various  Taylor  series  expansions. 
Furthermore,  the  derived  expression  for  the  output  variance  is 
dependent  upon  the  input  interval  lengths  and  the  input  proba¬ 
bility  density  function  evaluated  at  the  midpoints  of  these  inter¬ 
vals. 

In  this  note  we  derive  a  generalization  of  Wood’s  results  that 
eliminates  a  number  of  his  approximations  and  generalizes  the 
results  to  apply  to  more  than  just  Max  quantizers. 


Manuscript  received  May  5,  1978;  revised  September  5,  1978.  This  work 
■u  supported  in  pari  by  the  National  Science  Foundation  under  Ciruni 
ENG-7682426  and  in  part  the  Air  Force  Office  of  Scientific  Research.  Air 
Force  Syetemt  Command,  USAF  under  Gram  AFOSR-78-3605. 

The  authors  are  with  the  School  of  Engineering.  Purdue  University,  West 
Lafayette,  IN  47907 


00I8-9448/79/050O-0365S00.75  CI979  IEEE 


366 


IBEE  TRANSACTIONS  ON  INFORMATION  THEORY,  VOL.  IT-25,  NO.  3,  MAY  1979 


II.  Development 

In  (he  sequel  il  is  assumed  all  random  variables  have  finite 
second  moments. 

Property  1:  The  mean  value  of  the  output  of  a  minimum 
mean-square  error  quantizer  is  equal  to  the  mean  value  of  the 
input. 

Proof:  Consider  (3): 

P(Xj-i<x  <Xj)yj -  C'  xf(x)  dx.  (3) 

Sum  both  sides  of  the  equation  from  j—  I  to  j  —  S .  The  result 
follows.  Property  I  allows  us  to  assume  the  input  density  has 
zero  mean  without  loss  of  generality. 

Property  2:  The  variance  of  the  output  of  a  minimum  mean- 
square  error  quantizer  is  always  less  than  or  equal  to  the  input 
variance.  Furthermore  the  mean-square  quantization  error  is 
given  by  the  difference  of  the  two. 

Proof:  Let  us  consider  the  mean-square  error  e  between  the 
quantizer's  input  and  output: 

f-S  To '-x)2f(x)dx  (4) 

i-l 

where  x0  and  xN  are  the  smallest  and  largest  values  taken  on  by 
the  input  density  and  may  take  on  the  values  —  oo  and  +  co, 
respectively.  Expanding  the  integrand  and  using  the  expression 

2  [,'x2f(x)dx-E{x2)-ol 

i 

and  (3),  we  find  that 

e-oi~  2  1  <*  <*.)  (5) 

.-i 

where  E  { • }  is  the  statistical  expectation  operator.  But  we  notice 
that  the  last  term  on  the  right  is  the  variance  of  the  output,  o2. 
Since  e  >  0.  this  implies 

•?>«!?•  (6) 
Property  3:  The  signal  and  quantization  noise  are  always 
nonpositively  correlated  at  the  output  of  the  minimum  mean- 
square  error  quantizer. 

Proof:  Consider  an  additive  noise  model  for  the  quantizing 
error;  by  Property  2, 

E({x  +  n)2)-E{x2)+lE{xn)  +  E[n2)<E{x2).  (7) 

This  implies  that  E  { xn )  <  0.  Therefore,  since  x  has  mean  zero, 
the  correlation  coefficient  must  be  nonpositive. 

Remark :  The  above  proofs  depend  only  upon  the  output 
levels  being  chosen  as  the  conditional  means  of  the  input  inter¬ 
vals.  Therefore,  the  same  theorem  applies  to  the  maximum 
entropy  and  equal  interval  quantizers  when  the  output  levels  y, 
are  chosen  as  above.  As  indicated  by  an  anonymous  reviewer, 


Property  2  may  also  be  easily  derived  by  averaging  the  condi¬ 
tional  mean  square  error  over  all  the  quantization  intervals 
where  we  condition  on  the  event  of  being  in  one  particular 
quantization  interval. 

III.  Discussion 

Some  interesting  observations  can  be  made  when  these  results 
are  compared  with  the  recent  papers  of  Wise  et  al.  [6]  and  Sripad 
and  Snyder  [?}.  In  |6}  it  is  shown  that  the  rms  bandwidth  of  any 
(stationary)  Gaussian  process  must  always  increase  on  passing 
through  a  memoryless  nonlinearity.  By  using  the  result  of  Wise 
et  al.,  we  can  say  that  a  Max  quantizer  operating  on  a  stationary 
Gaussian  input  increases  the  rms  bandwidth  while  simulta¬ 
neously  reducing  the  variance. 

In  (7],  Sripad  and  Snyder  develop  necessary  and  sufficient 
conditions  for  the  quantization  error  to  have  a  uniform  distribu¬ 
tion.  In  addition,  they  derive  sufficient  conditions  for  the  signal 
and  quantization  error  to  be  un correlated  given  that  the  error  is 
uniformly  distributed.  Consider  the  case  where  the  random 
variable  to  be  quantized  is  uniformly  distributed.  The  Max 
quantizer  for  this  case  is  the  equal  step  size  quantizer.  It  is  found 
that  the  uniform  input  density  satisfies  the  conditions  for  the 
quantization  error  to  be  uniform  but  fails  the  conditions  for 
uncorrelatedness.  The  results  contained  herein  confirm  this  re¬ 
sult  and  in  fact  tell  us  the  signal  and  noise  are  strictly  negatively 
correlated. 

Acknowledgment 

We  express  our  thanks  to  Ed  Delp  for  posing  the  problem. 
References 

ft]  i.  Max,  “Quantizing  for  minimum  distortion,"  IRE  Tram  Inform 
Theory,  vol.  IT-6,  pp  7-12,  Mar.  1960. 

(2)  P.  E.  Fleischer,  “Sufficient  conditions  for  achieving  minimum  distortion 
in  a  quantizer."  IEEE  fnlf.  Corns,  ftec.,  P.  1.  pp.  104-tll.  1964. 

(3)  G.  M.  Roe,  “Quantizing  for  minimum  distortion,'  IEEE  Tram  Inform. 
Theory,  vol.  IT- 10.  pp.  384-385,  Oct.  1964. 

|4)  B.  Widrow,  "A  study  of  rough  amplitude  quantization  by  means  of 
Nyquist  sampling  theory,"  IRE  Tram.  Circuit  Theory,  vol.  CT-3,  pp. 
226-276,  Dec.  1956. 

(5)  R.  C.  Wood.  “On  optimum  quantization,"  IEEE  Tram  Inform,  theory, 
vol.  IT-5,  pp.  248-252.  Mar.  1969 

(6)  G.  L.  Wise.  A.  P.  Traga lulls,  and  J.  B.  Thomas,  ‘The  effect  of  a 
mcmoryless  nonlinearity  on  the  spectrum  of  a  random  process."  IEEE 
Tram.  Inform  Theory,  vol.  IT-23,  pp.  84-89.  Jan.  1977. 

[7J  A.  B.  Sripad  and  D.  L.  Snyder,  "A  necessary  and  sufficient  condition  for 
quantization  errors  to  be  uniform  and  white,"  IEEE  Tram.  Acousi . 
Speech,  Signal  Processing,  vol.  ASSP-25,  pp.  442-448.  Oct.  1977. 


Correction  to  1976-1977  List  of  Reviewers 

It  has  been  brought  to  our  attention  that  the  name  of  C.  E. 
Sundberg  was  inadvertently  omitted  from  the  1976-1977  list  of 
reviewers  which  appeared  on  pages  654-655  of  the  November 
1977  issue  of  this  Transactions.  We  sincerely  apologize  for  this 
oversight. 


SOME  PROPERTIES  OF  UNIFORM  STEP  SIZE  QUANTIZERS* 

JAMES  A.  BUCKLEW  and  NEAL  C.  GALLAGHER,  JR. 
School  of  Electrical  Engineering 
Purdue  University 
West  Lafayette,  IN  47907 


This  research  was  supported  by  The  Air  Force  Office  of  Scientific  Research 
under  grant  AFOSR  78-3605. 

ABSTRACT 

This  paper  treats  some  properties  of  the  optimal  mean  square  error  uni¬ 
form  quantizer.  It  is  shown  for  the  OUQ  that  the  mean  square  error  (mse) 
is  given  by  the  input  variance  minus  the  output  variance.  It  is  shown  that 

lim  — -  >_  1  with  equality  when  the  support  of  the  random  variable  is  con- 
N-»«  A  /1 2 

tained  in  a  finite  interval.  A  class  of  probability  densities  which  have 

the  above  limit  greater  than  1  is  given.  It  is  shown  that 

lim  mse  =  (b-a)^/12  where  (b-a)  is  the  measure  of  the  smallest  interval 
N*<* 

that  contains  the  support  of  the  input  random  variable. 

In  many  problems  arising  in  the  evaluation  or  design  of  a  control  or 
communication  system,  it  becomes  necessary  to  predict  the  performance  of  a 
uniform  quantizer.  Uniform  quantizers  are  of  interest  because  they  are 
usually  the  simplest  quantizer  structure  to  implement.  The  study  of  uni¬ 
form  quantization  is  also  of  interest  because  many  noise  processes  in  phy¬ 
sical  systems  may  be  considered  to  be  the  noise  produced  by  a  uniform  quan¬ 
tizing  operation.  For  example  the  final  position  of  a  stepping  motor  or 

the  line  drawn  by  the  pen  of  a  computer  plotting  device  under  a  continuous 

control  may  be  considered  to  be  corrupted  by  a  uniform  quantizing  opera¬ 
tion. 

Because  of  the  importance  of  these  quantizers  several  authors  have  con¬ 
sidered  various  properties  of  them.  Widrow  HID  shows  that  under  certain 
conditions  on  the  characteristic  function  of  the  input  random  variable,  the 
quantization  noise  is  uniformly  distributed.  Gish  and  Pierce  C2]  show  that 
asymptotically  the  uniform  quantizer  is  optimum  in  the  sense  of  minimizing 
the  output  entropy  subject  to  a  fixed  mean  square  error  value.  Sripad  and 
Snyder  C3]  later  extend  Widrow's  work  to  give  a  sufficient  condition  for 
when  the  quantization  error  is  uniform  and  uncorrelated  with  the  input  ran¬ 
dom  variable. 

We  will  now  state  and  prove  some  additional  properties  of  these  quantiz¬ 
ers  when  we  design  them  to  minimize  the  (nean  square  error.  We  may  write 
down  the  analytic  expression  for  the  quantizer  characteristic  g(x>  as, 

a  x<q 

g(x)  *  a+(i+1)A  q+i A<x<q+(i+1) a  i=0,...,N-3  (1) 

a+(N-1) a  x>CN-2) A+q 

where  N  is  the  number  of  output  levels  in  the  quantizer.  We  see  that  if  x 
is  less  than  q  or  greater  than  q  +  (N-2)a,  then  x  is  truncated  to  a  and  a  + 
<N-1)a  respectively.  An  important  parameter  of  interest  is  the  width  of 
the  nontruncation  region  which  equals  <N-2)a. 

The  quantizer  characteristic  glx)  must  be  optimized  with  respect  to 
three  parameters,  q  which  fixes  its  position  along  the  x  axis,  a  which 

P^&Ac nted  at  the  Seventeenth  A nniucit  Ktt.eAton  Convenience  on  ComrKuu'cntionA 
ContAol  and  Computing,  October  10-1?.,  1 979. 


fixes  its  position  along  the  y  axis,  and  A  which  spec.fies  the  step  size  of 
the  quantizer.  Because  it  makes  little  sense  to  speak  of  minimizing  the 
mean  square  error  of  a  random  variable  with  infinite  variance,  we  will  al- 

“  2 

ways  assume  $  x  f<x)  dx  <  <=. 

Property  1 


The  optimum  uniform  quantizer  preserves  the  mean  of  the  input  random 
variable. 

Proof: 

Suppose  g(x)  is  the  optimum  uniform  quantizer.  Then  we  must  have 

$  (x  -  g(x)  +  e>2  f<x)  dx  | e=q  =  0.  (2) 

This  implies. 


J  x  f (x)  *  $  g(x>  f (x) . 


Property  2 


For  the  optimum  uniform  quantizer  we  have  that 


a  =  q  -  a/2. 


Proof: 


Suppose  g(x)  is  the  optimum  uniform  quantizer.  Then  we  must  have 
5  (gCx  -  c)  -  x)2  f (x)  dx  I e=g  =  0* 

=  jjle"  5  9 lx~c) 2  f(x)  dx  -  ~  5  2xg(x-e>  f(x>  dxj  |£  m  Q. 

*  T-  E  (a+(i+1)A>2  JqtC!|l+1>&  f(x>  dx  +  ^q+e  f(x)  dx 

de  .  _g  •'q+e  +  TA  *  —  «d 

*  $;.ct(N_2)a  ft»  dx 

“  2  E  (a+Ci+1)  a)  xf<x>  dx  +  a  $  xf(x)  dx 

.  (a.(N-l)A)  S;n+(N„2)A  xf(x)  dx  |c=0  =  0. 


N-3 

=  Z 

_i=0 


(a^(i^Di)  Cf (q+c  +  (  1+1 ) A)  -  f(q+e+iA)D 


+  a2  f(q*e>  -  <a+(N-1 ) a) 2  f (q+e+<N-2) A> 


(a+(i+1)A)  C(q+e*(i+1) A)  f (q+e+(i+1) A) 


-  (q+e+i A)  f(q+rMA)3  ♦  a(q+c)  f(q+e) 

-  (a+(N-1)A)  (q+c+(N-2)A)  f (q+c+(N-2) A) 
Simplifying  this  expression  we  obtain 


( A+2a-2q)  £  f(q-HA)  =  0. 

i=0 
N-2 

The  solution  J)  f(q-HA)  ®  0  is  of  no  interest  because  without  affecting 
i=0 

the  mean  square  error ,  we  may  always  arbitrarily  set  f(q+iA)  =  0,i  = 
0,...,N-2.  Hence  A+2a-2q  =  0  which  is  what  we  wish  to  prove. 


Property  3 

The  mean  square  error  of  an  optimum  uniform  quantizer  is  given  by  the 
input  variance  minus  the  output  variance. 

Proof: 

We  again  write  the  mean  square  error  mse  as 

mse  =  ETx2>  -  2  E<xg(x)>  +  E<g(x)2>.  (8) 

We  wish  to  optimize  this  equation  with  respect  to  A.  Using  a  =  q-A/2  we 

first  obtain 


ETxg(x)}  »  £  (q+(i+^-)  A)2  xfCx)  dx 


+  <q-A/2)  52,  xf(x)  dx  ♦  (q+(N-|-)  A)  xf(x)  dx 


E€g(x)2>  =  £  <q*(i4-)A)2  Jq*^+1>A  fCx)  dx 

i=0  q 

♦  (qy)2  f(x)  dx  ♦  (q+(N-|)A)2  $’+{N_2)4  f<x>  dx*  (10> 

Now  substitute  Eq.  (9)  and  Eq.  (10)  into  Eq.  (8);  take  the  partial  deriva¬ 
tive  with  respect  to  A  and  set  the  result  equal  to  zero.  We  find  that 

E<xg(x)>  ♦  qE<g(x)>  *  E<g(x)2>  ♦  qECx>.  (11) 

But  E{g(x)>  =  E<x>  for  the  optimum  quantizer.  Hence  E<xg(x)>  =  E<g(x)2> 

and  we  have  for  the  mean  square  error  mse 

mse  =  E<x2>  -  E<g(x)2> 


(12) 


which  together  with  Property  1  finishes  the  proof. 


Sripad  and  Snyder  C33  show  sufficient  conditions  for  (x-g(x>>  to  be  uni¬ 
form  and  uncorrelated  with  x  to  be 


♦  <*?> 
x  A 


n  *  ±1,  ±2,... 


where  $x(w)  is  the  characteristic  function  of  the  input  random  variable  x. 

Frequently  in  the  analysis  of  a  system  corrupted  by  a  uniform  quantizing 
operation  the  assumption  is  made  that  the  quantization  noise  is  uncorrelat¬ 
ed  (sometimes  independence  is  assumed)  with  the  input.  The  next  property 
demonstrates  that  this  can't  be  done  with  the  optimum  uniform  quantizer. 

Property  U_ 

Suppose  the  input  probability  density  is  Riemann  integrable.  Then  the 
quantization  noise  can't  be  uncorrelated  with  the  input  for  the  optimum  un¬ 
iform  quantizer. 

Proo  f ; 

Without  loss  of  generality  assume  E<X>  =  0.  Now  suppose  the  converse  to 
the  property.  This  implies 

Et(x-g<x))x>  *  E<x2>  -  E<g(x)x>  =  0  (14) 

But  from  Property  3 


E<xg(x)>  *  E<g(x;> 
E<x2>  -  E<g(x)2}  =  0 


hence 


But  again  from  Property  3,  the  left  hand  side  of  Eq.  (15)  is  the  mean 
square  error  which  implies  a  contradiction.  That  a  probability  density 
function  is  Riemann  integrable  necessarily  implies  that  the  mean  square  ei — 
ror  for  any  finite  number  of  output  levels  is  greater  than  zero  (i.e.  f(x) 
has  no  delta  functions). 


We  now  state  an  obvious  property  which  will  be  used  in  several  subse¬ 
quent  proofs.  The  proof  of  property  5  follows  from  a  simple  application  of 
the  Lebesgue  dominated  convergence  theorem. 


The  mean  square  error  approaches  zero  for  the  optimal  uniform  quantizer 
as  the  number  of  output  levels  approaches  ®. 

Let  I  =  Ca,bD  be  the  smallest  interval  such  that  fb  f(x)  dx  =  1.  Note 

d 

that  | a |  or  |b|  may  be  infinite. 


Suppose  f(x>  is  Riemann  integrable.  Then  for  the  optimum  uniform  quan- 


tizer,  lim  (N-2)a  =  b-a. 

N+« 

Proof: 

Suppose  lim  (N-2)a  <  b-a.  This  implies  for  N  sufficiently  large  that  we 
N>» 

are  always  truncating  some  finite  amount  of  probability  mass  which  means 
the  mean  square  error  can't  go  to  zero  which  is  a  contradiction  of  the  pre¬ 
vious  property.  Hence  we  have  the  lim  (N-2)A  £  b-a. 

N-»« 

Let  us  suppose  lim  (N-2) A  >  b-a.  Note  that  this  makes  sense  only  if  the 
N-»» 

random  variable  is  of  finite  support.  Now  for  N  large  enough  there  is  no 
truncation  error.  It  is  easy  to  show  as  will  be  done  in  the  next  property 

that  for  a  quantizer  with  no  truncation  error,  lim  ■  ?s-*  -  =  1  for  a  Riemann 

N—  a  / 12 

integrable  density  function.  So  for  N  sufficiently  large  (N-2)a  >  C  >  b-a 
<  •.  Then 

.  _  i •  mse  «  *  mse 

i  -  lim  ■  v'——  <  in  -  ■  . x  or 


N~  a  /12  N—  C  /12CN-2) 

2  C2 

lim  (N-2)  mse  >  ^  (16) 

N*® 

Consider  now  a  suboptimal  quantizer  whose  input  intervals  are  given  by  di¬ 
viding  up  the  interval  I  into  N-2  equal  subintervals.  Denote  the  mean 
square  error  or  this  quantizer  as  mseSUB,  and  its  step  size  A$  =  (b- 

a)/(N-2).  This  quantizer  has  no  truncation  error  and  hence 


1  =  lim  ■  «-SU-B-  =  lim 


N—  Ag/12  N-*®  (b-a)  /1 2 (N-2) 


lim  (N-2)  mse. 


(b-a)2  .  C2  ^  ,*2 

— 12 —  <  T?  —  ^im  mse 

N-*® 


which  is  a  contradiction  since  we  have  found  a  suboptimal  quantizer  with  a 
better  mean  square  error  than  the  optimal. 


Property  7_ 

Suppose  the  density  function  is  Riemann  Integrable  and  (b-a)  <  «.  Then 
for  the  optimal  uniform  quantizer  we  have 

mse 

lim  — jp —  *  1 . 

N-*®  a  /1 2 

Proof: 

From  property  6  we  know  that  Urn  (N-2)  A_  »  b-a  <  •  where  An  is  the  op¬ 


timum  A.  We  may  design  a  suboptimum  quantizer  by  dividing  the  Interval  I 


(smallest  Interval  such  that  f(x)  dx  =  1)  into  N-2  equal  subintervals 

a 

and  use  these  subintervals  as  the  breakpoints  for  our  quantizer.  We  will 
denote  the  mean  square  error  associated  with  this  quantizer  as  msesug  and 

the  step  size  Ag  =  <b-a)/(N-2>. 

Define 


M  £  Supf(x> 

1  '  Xe(q+i A)  q+(i+1) A) 


and 


m.  =  “  f(x> 

1  Xe (q+i A)  q+(i+1) A) 

Then  since  there  is  no  truncation  error  for  the  suboptimal  quantizer  we 
have 

N-3  q+(i+1)A  .  2 

£  mi  q+iA  (X-(q+(i^)Ag)>  dx  <  *seSUB 

1  *u  s 

and 

N-3  fq+(i+1)A  .  2 

■”SUB  i  "i  Vi.  d.  (18) 

I  — U  u 


or 

Ag  N-3  A*  N-3 

Tf  £  miAs  -  msesuB  -  T?  i3(J  Vs 

N-3  mse  N-3 

lim  £  m . a  <  lim  »  *■  ■■■  <  lim  £  M.A 
N—  i=0  1  5  N>®  a;/12  ”  N—  i=0  1 

is  Riemann  integrable 

N-3 

L  M.A  =  1 
1=0  1  5 


=  1.  (21) 


1 ,  which  gives  automatically 


nontruncation  region  covers  the 


Now  since  f(x)  is  a  density  function  and 

N-3 

lim  £  m.A  *  lim 
N-*<®  i=0  1  N-*» 

implies 


(N-2) A, 


Now 


Urn  -=■  *  lim 

A 


lim 

N— 


m^eSUB 

4n2 


lim(N-2) Ac 
N-*« 


u  un  M  (N-i>Ari  "  lim (N— 2)  An 

0  0  n  0 


■*$ 

Lim  -a-  *  1. 
N~  A* 


Now  for  any  quantizer  who 


(19) 

(20) 


support  of  the  Riemann  integrable  density  function  in  the  limit  as  N  ap¬ 
proaches  Infinity,  we  may  show  as  above  that  lim  -5se-  >  1.  This  bound  is 

N—  a  / 12 

arrived  at  by  Ignoring  the  truncation  error  and  is  true  for  finite  or  in¬ 
finite  support  density  functions.  We  now  have  the  following  string, 

«.  raa .  u.  Cm.  <4^> 


N—  Aq/12  N~  Ag/12  Aq/12 

***SUB  V12 

*  dim  — dim  - )  =  1 


N—  Aj/12  N—  Aq/12 


1  =  lim 


>  lim 


’OPTIMAL 


N—  Aq/12  N—  Aq/12 


Or  lim 
N-»«> 

which  is  what  we  wanted  to  prove. 


‘OPTIMAL 

y — 

aJ/12 


Zador  C43  shows  that  if  f(x)  is  Riemann  integrable  and  E<x  °>  <  »  for 
same  6  >  0;  then  we  have  for  the  optimal  nonuniform  quantizer 

Urn  N2  mse  =  * !f ! l1/3/12 
N-»*» 

where  1 1^1 1 1/3  L^3  nonB-  This  result  shows  that  for  the  nonuniform 

2 

quantizer,  the  mean  square  error  decreases  on  the  order  of  1/N  for  large 
N.  Is  there  a  similar  property  for  the  optimum  uniform  quantizer?  We  now 
give  our  next  property. 

Property  ji 

Suppose  f(x)  is  Riemann  integrable.  Then  for  the  optimum  uniform  quan¬ 
tizer  Urn  N2  mse  =  '^y '  *• 

N~» 

Proof: 

Suppose  (b-a)  *  •.  Then  1  <  lim  -!pe-  ■  lim  —  2«-  y**  ■ 

“  N—  a  /12  N—  (N-2)  V/12 

_  lim  <N-2)2mse 

— Ti — 
lim  M*a  /12 

but  (H-2)2  a2  ♦  •  which  implies  lim  (N-2)2  mse  ♦  •. 


If  b-a  <  •  then  lim  -T-sc-  ■  »  1  or  lim<N-2)2ntse 


N—  a/12 


Um  N  mse 

N-wi 


(24) 


lim<N-2>  a  <b-a>  ....... 

- jpj— - *  — -which  finishes  the  proof. 


Discussion 


We  should  note  that  not  everyone  employs  the  same  definition  of  optimum 
uniform  quantizer  that  we  have  used.  For  example  Pearlman  and  Senge  C51 
have  published  tables  of  the  optimal  uniform  Rayleigh  quantizer.  For  their 
computations/  they  add  the  constraints  a  =  0  and  that  q  =  a/2. 

It  is  interesting  to  note  that  properties  1  and  3  are  also  shared  by  the 
optimal  nonuniform  quantizer  as  shown  in  C63.  As  a  further  consequence  of 
these  two  properties  we  find  that  for  the  N=2  case,  the  optimum  uniform 
quantizer  and  the  optimum  nonuniform  quantizer  are  identical. 

Property  7  is  one  of  the  more  interesting  properties  proved  in  this  pa¬ 
per.  A  common  approximation  to  the  mean  square  error  of  a  uniform  quantiz¬ 
er  has  been  a2/12.  Consider  the  class  of  density  functions  given  by 


f(x) 


(1  ♦  f 


(1  +  |x|> 


3+6 


<  x  <  -. 


We  easily  see  that  a  =  Sup  Ce:  /  x2+e  f(x>  dx  <  •>.  By  straightforward 

minimization  techniques  one  can  show  for  this  class  of  densities  that 


. .  mse  „ 
lim  — =5 *  1 

N—  A  /12 


Property  8  is  of  interest  because  it  sets  forth  a  basic  difference 
between  uniform  and  nonuniform  quantizers.  For  the  nonuniform  quantizer  we 

can  expect  the  mean  square  error  to  be  of  the  order  1/N2.  We  can  expect 
this  rate  of  convergence  to  zero  for  the  uniform  quantizer  only  if  the  pro¬ 
bability  density  is  of  finite  support.  We  may  show  for  the  optimal  uniform 

Gaussian  quantizer  that  the  error  is  the  same  or  larger  than  in  N/N2. 

REFERENCES 

Cl D  B.  Widrow/  "Statistical  analysis  of  amplitude  quantized  sampled  data 
systems/"  Trans.  Amer.  Inst.  Elec.  Eng./  Pt.  11/  Applications  and  In¬ 
dustry,  Vol.  79,  pp.  555-568,  Jan.  1960. 


C2]  H.  Gish  and  J.  N.  Pierce,  "Asymptotically  efficient  quantizing,"  IEEE 
Trans.  Inform.  Theory,  Vol.  IT-14,  pp.  676-683,  Sept.  1968. 


C33  A.  B.  Sripad  and  D.  L.  Snyder,  "A  necessary  and  sufficient  condition 
for  quantization  errors  to  be  uniform  and  white,"  IEEE  Trans.  Acous¬ 
tics,  Speech,  and  Signal  Processing,  Vol.  ASSP-25,  pp.  442-448,  Oct. 
1977. 

C43  P.  Zador,  "Development  and  evaluation  of  procedures  for  quantizing 
multivariate  distributions,"  Ph.D.  dissertation,  Stanford  University, 
Stanford,  CA  1964. 


C5]  W.  A.  Pearlman  and  G.  H.  Senge,  "Optimal  quantization  of  the  Rayleigh 
probability  distribution,"  IEEE  Trans.  Communications,  Vol.  COM-27, 
pp.  101-112,  Jan.  1979. 


C63  J.  A.  Bucklew  and  N.  C.  Gallagher,  Jr.,  "A  note  on  optimum  quantiza¬ 
tion,"  IEEE  Trans.  Info.  Theory,  Vol.  IT-25,  pp.  365-366,  May  1979. 


ON  THE  DETERMINATION  OF  REGRESSION  FUNCTIONS 
GARY  L.  WISE 

Department  of  Electrical  Engineering 
University  of  Texas  at  Austin 
Austin,  Texas  78712 

and 

NEAL  C.  GALLAGHER,  JR. 

School  of  Electrical  Engineering 

Purdue  University 

West  Lafayette,  Indiana  47907 

ABSTRACT 


This  paper  is  concerned  with  the  determination  of  regression  functions 
from  only  a  partial  characterization  of  the  joint  distribution.  It  is 
shown  that  statistical  information  consisting  of  various  moments  and  joint 
moments  is  sufficient  to  characterize  a  regression  function.  An  applica¬ 
tion  to  regression  functionals  is  also  considered. 

I .  INTRODUCTION 


Let  X  and  Y  be  random  variables  with  Y  integrable,  i.e.  E{|Y|)  <  «, 
and  consider  the  regression  function  of  Y  on  X, 

m(x)  -  E{Y]X-x}. 


As  is  well  known,  m(-)  is  a  Borel  measurable  function,  and  it  frequently 
arises  in  engineering  applications.  For  example,  if  Y  is  a  second  order 
random  variable,  then  the  minimum  mean  squared  error  estimate  of  Y  In 
terms  of  X  Is  given  by  m(X)  [1,  pp.  77-78]. 

In  some  cases  m(.)  has  a  particularly  simple  form.  For  example,  if 
X  and  Y  are  Jointly  Gaussian  with  respective  means  and  my,  respective 

variances  aj^  >  0  and  ov^,  and  correlation  coefficient  p,  then 


m(x)  ■  ax  +  b, 


(1) 


where  a  »  (Cy/o^p  and  b  ■ 
Gaussian  random  variables, 


mY  "  3V 

V  V  °x* 


However,  In  the  case  of  jointly 
Oy,  and  p  completely  determine  the 


bivariate  distribution  of  the  two  random  variables. 

In  more  general  cases,  the  question  arises  as  to  how  much  information 
about  the  bivariate  distribution  is  required  to  determine  the  regression 
function.  If  X  and  Y  are  two  second  order  random  variables  that  are 
separable  in  the  sense  of  Nuttall  [2],  then  the  regression  function  m(.) 
has  the  form  given  by  (1).  However,  knowing  that  two  second  order  random 
variables  are  separable  in  the  sense  of  Nuttall,  and  knowing  the  means, 
variances,  and  the  correlation  coefficient  is  not  sufficient  to  determine 
the  bivariate  distribution  of  the  two  random  variables.  Notice  that  any 
two  random  variables  whose  bivariate  characteristic  function  is  ellipti- 
cally  symmetric  are  separable  in  the  sense  of  Nuttall  [3]. 

As  we  have  seen,  there  exists  a  class  of  joint  distributions  such  that 
the  regression  function  can  be  determined  knowing  that  the  two  random 
variables  belong  to  that  class  and  also  knowing  means,  variances,  and  the 
correlation  coefficient.  However,  it  might  seem  reasonable  to  conjecture 
that  in  more  general  cases,  the  regular  conditional  distribution  [4]  of  Y 


Presented  at  the  Seventeenth  Annual  Allerton  Conference  on  Ccmrunioation , 
Control ,  and  Computing,  October  10-12,  1979 ;  to  be  published  in  the 
Proceedings  of  the  Conference. 


given  X=x  is  required.  Although  the  regular  conditional  distribution  of 
Y  given  X*x  is  sufficient  to  determine  m(x) ,  in  the  next  section  we  will 
see  that  it  is  never  necessary. 

In  this  paper  we  will  be  concerned  with  statistical  information  such 
that  there  can  be  only  one  regression  function  consistent  with  the  given 
statistical  Information.  In  the  next  section  we  consider  the  regression 
of  Y  on  a  random  variable  and  then  on  a  random  vector.  Then  in  the 
following  section  we  consider  the  regression  functional,  that  is,  the 
regression  of  Y  on  a  random  process. 

II.  DEVELOPMENT 

Let  Y  be  a  second  order  random  variable,  let  X  be  a  random  variable 
with  compact  support,  and  let  m(*)  be  given  by  Eq.  (1).  Define  the  measure 
P  on  the  Borel  sets  of  1R  by 

P (A)  -  P(X€A)  , 

and  let  ||  •  ||  denote  the  L2(p)  norm.  We  will  say  that  a  polynomial  has 

max  degree  N  if  the  degree  of  the  polynomial  is  no  greater  than  N.  We 
note  that  for  any  e  >  0,  if  N  is  sufficiently  large,  there  exists  a  poly¬ 
nomial  of  max  degree  N  PN(x)  such  that 

II  »  “  PHII  <  e-  (2) 

That  is,  there  exists  a  continuous  function  h(*)  such  that  [5] 

||  m  -  h ||  <  e/2  , 

and  by  the  Weierstrass  Theorem  there  exists  a  polynomial  P  of  max  degree 
N  with  N  sufficiently  large  such  that 

II  h  -  Pj|  <  e/2  . 

Thus  Eq.  (2)  follows  by  the  triangle  inequality.  Hence  there  exists  a 
sequence  of  polynomials  P„(x)  such  that 

PN(x)  ■*  m(x)  in  L2(p)  . 

Let  QN(x)  *>e  the  polynomial  of  max  degree  N  that  is  closer  to  m(x)  (in 

L2(p))  than  any  other  polynomial  of  max  degree  N.  We  note  in  passing  that 

Q^(x)  is  uniquely  defined  a.e.  [p]  by  the  Projection  Theorem.  That  is, 

there  may  exist  more  than  one  representation  of  Q„(x)  (i.e.  with  different 

coefficients)  but  they  are  all  equal  a.e.  [p].  From  the  preceding 
observations ,  we  have  that 

Qn(x)  -*■  m(x)  in  L2lp]  . 

Express  the  polynomial  Q^Cx)  as 
N  i 

Qn<*>  *  ^  Sj(N)  xJ  . 

3-0 

It  follows  from  the  Projection  Theorem  that  the  a.(N)  can  be  determined 
from  the  relation 


k  -  0,  1,  2,..., 


m(X) 


a, (N)  Xj 


!-o. 


■  E-j 

L  j-0 

This  is  equivalent  to 

EtxS)  -  ^aj(N>  E{X3+k},  k  -  0,  1,  2,...,  N  . 
j-0 

Thus  we  have  seen  that  from  a  knowledge  of 


(3) 


and 


E(X  }  ,  k  -  1,  2, 


E(YX  }  ,  k  -  0,  1,  2,  ... 


we  can  construct  a  sequence  of  polynomials  Q..(x)  that  converge  in  L0(y)  to 
z  ,  n  t 

m(x) . 

Now  let  X  be  an  arbitrary  random  variable.  Let  g  be  an  invertible 
Borel  measurable  function  whose  range  is  bounded.  Define  the  random 
variable  X  as  X  =  g(X),  and  the  measure  y  on  the  Borel  sets  of  R  by 
y(A)  =  P(XfA).  From  the  above  discussion,  we  see  that 

m(x)  =>  E{Y|X  -  x} 

is  determined  a.e.[y]  by  the  quantities 
k  -  1,  2,  ... 


E(Xk}  , 


and 


E{YXk)  ,  k  -  0,  1,  2,  ...  . 


(4) 

(5) 


Let  Q„(x)  denote  the  polynomial  of  max  degree  N  constructed  in  the 
N 

preceding  fashion.  Then 


Qjj(x)  -*•  m(x) 


in  L2(y)  . 


Notice  that  m(x)  -  m[g(x)J.  From  a  change  of  variables  result  [6,  p.  182], 
we  have  that 


i  l  i« 


[g(x>]  -  m(x) ]  y (dx)  . 


/  [Qn(x)  -  m(x)]2  y (dx) 
g(») 

Therefore,  QNfg(x)J  ■*  ra(x)  in  L2<y). 

Now  we  will  remove  the  restriction  that  Y  be  second  order.  Assume 
that  Y  is  an  integrable  random  variable  and  let 


(  y  if  |y|  <  k 

r(y)  m  { 

1  (  0  if  lyl  >  k  . 


Then  ^(Y)  is  a  second  order  random  variable  and  (1,  p.  23] 
E{Ck(Y)jx*x}  ■*  E{Y|X*x)  a.e.  [y  ]  . 

Since  |Gk(Y)-Y|  £  ]y|  and  )y|  is  integrable,  we  have  that  E{Gk(Y)|x-x} 
m(x)  in  L^(y )  by  the  dominated  convergence  theorem  [6,  pp.  124-125], 


-4- 

Thus  from  a  knowledge  of  the  quantities  in  Eqs.  (4)  and  (5)  we  can 
derive  a  sequence  of  estimates  for  E{ G^(Y) | X*x}  which  converges  in 

and  consequently  in  L^(u)  (see,  for  example,  [7]).  Also,  E{G^(Y) j X“x) 

converges  to  E{Y|X“x}  in  L^(p).  Thus,  by  a  straightforward  diagonalization 

procedure,  we  can  derive  a  sequence  of  estimates  which  converges  in  L^(p) 

to  m(x) .  These  results  are  summarized  in  the  following  theorem. 

Theorem  1 ;  Let  Y  be  an  integrable  random  variable,  let  X  be  an  arbitrary 
random  variable,  and  let  g  be  an  invertible  Borel  measurable  function 
mapping  the  reals  into  a  bounded  set.  Then  the  regression  function  m  is 
determined  a.e.[p]  by  the  quantities 

E{ [g(X)]k}  ,  k  =  1,  2,  ... 

and 

E{Y[g(X)]k}  ,  k  =  0,  1,  2 . 

Consider  for  the  moment  the  case  where  X  and  Y  are  independent.  In 
this  case  a  solution  to  Eq.  (3)  is  given  by 

aQ(N)  -  E(Y> 

aj(N>  *  0  .  j  >  0  , 

and  we  get  that  m(x)  -  E(Y}. 

Now  consider  the  following  two  different  bivariate  density  functions: 

(‘<x,y> '  ;#v4  Wx) 


(8) 


2p 

k+3  * 


-5- 


E{YXk}  - 


In  this  case,  for  N  >_  1,  Eqs .  (6)  and  (7)  still  satisfy  Eq.'  (3),  and  the 
regression  function  is  once  again  given  by  Eq.  (8).  Thus,  in  this  example, 
the  two  pairs  of  marginal  densities  are  not  the  same,  the  conditional 
densities  of  Y  given  X*x  are  not  the  same,  and  the  moment  sequences  are 
not  the  same;  however,  the  moment  sequences  are  sufficient  to  characterize 
the  conditional  expectations,  which  are  Identical.  Numerous  other  similar 
examples  may  easily  be  constructed. 

Now  we  will  consider  the  regression  of  Y  upon  a  set  of  random 
variables.  Let  X  be  an  arbitrary  random  vector  taking  values  in  Rn  ,  and 

let  p  be  defined  on  the  Borel  sets  of  Kn  by 
p(B)  «  P(XtfB)  . 

Lemma  1:  If  p  has  compact  support,  then  the  class  of  all  polynomials  is 
dense  in  L^Cp). 

Proof :  Let  q  be  an  arbitrary  element  in  L2(p).  For  any  e  >  0,  there 

exists  [5]  a  function  h:  Rn  -*•  R  which  is  continuous  and  has  compact 
support  such  that 

||  q-h  ||  <  e/2  . 

By  the  Stone-Ueierstrass  Theorem  [8]  there  exists  a  polynomial  p  in  n 
variables  such  that 

||  h-p  ||  <  e/2  , 

and  thus  by  the  triangle  inequality 

II  p-q  II  <  e  • 

QED 

We  recall  that  the  degree  of  a  monomial  in  n  variables  is  the  sum  of 
the  powers  of  the  variables,  and  the  degree  of  a  polynomial  is  the  degree 
of  the  monomial  having  the  largest  degree  over  all  the  monomials  in  the 
polynomial  with  nonzero  coefficients.  There  are 

com)  .  rr) 

monomials  of  degree  d  in  n  variables  [9]. 

Assume  that  Y  is  a  second  order  random  variable,  and  define  m(x)  by 

Eq.  (1),  where  x  is  now  an  element  of  Rn  .  Assume  that  P  has  compact 
support.  Let  Qu(x)  be  the  polynomial  of  max  degree  N  which  is  closer,  in 

the  L^Cp)  norm,  to  m(x)  than  any  other  polynomial  of  max  degree  N. 

Consider  a  monomial  in  n  variables  of  degree  d.  There  will  be 
C(n,d)  of  them.  Order  them  lexicographically  by  the  powers  of  the 
components  of  x,  and  let  m^(x)  denote  the  J-th  monomial  of  degree  d. 

Then  Qu(x)  can  be  expressed  as 
N 


N  C(n,d) 

v«>  ■  Z  Z 

d-0  j-1 


ajd(N)mjd(x)  * 


It  follows  from  the  Projection  Theorem  that  the  coefficients  a. .(N)  are 

jd 

given  by  the  solution  to  the  following  set  of  equations: 

N  C(n,d) 

E{V«It(X»  -  £  £  .jd<»>  E(»jd(X)  mlk(X».  ( 

d-0  j-1 

k  -  0,  1,  ...»  N  and  i  =  1,  . ..,  C(n,k).  If  the  coefficients  a. ,(N) 
satisfy  Eq.  (9),  then  it  follows  from  Lemma  1  that 


Qn(x)  m(x) 


in  L2(y)  . 


Now  we  remove  the  assumption  that  X  has  compact  support  and  let  X  be 

an  arbitrary  random  vector  taking  values  in  Rn  .  Let  g  be  an  invertible 

Borel  measurable  function  mapping  Rn  into  a  bounded  subset  of  Rn  ,  and 
let  X  =  g(X).  We  see  that 

m(x)  -  E{Y|x-x} 

is  determined  a.e.[p],  where  p(A)  »  p[g  *"(A)],  by  the  quantities 
E{mjd(X)> 


E{Ymjd(X)} 

for  d  -  0,  1,  2,  ...  and  j  =  1,  ...,  C(n,d).  Let  Q..(x)  be  the  polynomial 

N 

of  max  degree  N  determined  in  the  preceding  fashion.  Then,  similar  to  the 
development  of  Theorem  1,  we  can  employ  a  change  of  variables  result 
[6,  p.  182]  to  conclude  that 


QN[g(x)]  -*•  m(x) 


in  L2<m) 


A  chopping  argument  as  in  the  development  of  Theorem  1  allows  us  to  remove 
the  second  order  restriction  on  Y.  Then  a  straightforward  diagonalization 
procedure  results  in  a  sequence  of  estimates  which  converges  to  m(x)  in 
Lj'(p).  This  result  is  summarized  in  the  following  theorem. 

Theorem  2 :  Let  Y  be  an  integrable  random  variable,  let  X  be  an  arbitrary 

random  vector  taking  values  in  ]Rn  ,  and  let  g  be  an  invertible  Borel 

measurable  function  mapping  Rn  into  a  bounded  subset  of  Fn  .  Then  the 
regression  function  m  is  determined  a.e.[u]  by  the  quantities 


E{mjd[g(X)]}  and  E{Ymjdfg(X)]} 


for  d  ■  0,  1,  2, 


and  j-1,  ...»  C(n,d) 


REGRESSION  FUNCTIONALS 


As  before,  assume  that  Y  is  an  integrable  random  variable,  but  now 
let  T  be  an  infinite  subset  of  R  and  let  (X(t),  t£  T)  be  a  random 
process.  Let  S  denote  the  space  of  all  extended  real  valued  functions 
defined  on  T,  and  let&?(S)  denote  the  o-algebra  on  S  generated  by  the 
class  of  all  cylinders  in  S.  Let  38  denote  the  Borel  sets  of  R.  Then 
the  regression  functional 

m[x(t) ,  t  £  T]  =  E{Y|x(t)  -  x(t),  t £  T} 

is  a  measurable  function  from  (S,3ff(S))  to  (R  ,38)  (see,  for  example, 

[10]). 

Let  u  be  the  measure  induced  on3ff(S)  by  (X(t),  t £ T] .  That  is,  for 
any  cylinder  C  in  S,  p(C)  =  P({X(t),  t  £ T}  £ C) ,  and  p  is  extended  to 38(S) 
via  Kolmogorov's  Theorem  (see,  for  example,  [11]). 

It  follows  from  [1,  pp.  21,  604]  that  there  exists  a  countable  subset 
of  T,  say  T  =  (t^,t  » — },  depending  on  the  random  variable  Y,  such  that 

E{ Y | X(t)  *  x(t),  t  £  T}  =  E(Y|X(t)  *  x(t) ,  t € T]  a.e.[p]  . 


Let 

M  -  E{Y|x(t),  t  £T], 

Mn  -  E(Y|X(t1) . X(t„)  }  , 

o(X(t),  t  £T], 


and 


4T  -  o{X(tl),  ....  X(tn)}. 

Then  from  the  properties  of  Iterated  conditional  expectations  [1,  p.  37], 
it  follows  that 

w'WV  -  M„  • 

and  hence  (M  ,  <j? ,  n  >  1}  is  a  martingale.  It  follows  from  [1,  p.  332] 
n  n  — 

that  -*•  M  wpl.  Since  E{  j | }  <_  E{|y|}  <  <*>,  it  follows  from  a  martingale 

convergence  theorem  [1,  p.  319]  due  to  Doob  that  E{  |  M  -M|}  -*•  0.  This  is 
equivalent  to 

E(Y|x(t1)  -  x(ti),  i-l,...,n}  E{Y|X(t)  -  x(t),  t  £  T} 

in  L^(p).  Notice  that  Theorem  2  is  applicable  to  E{Y|X(t^)  ■  x(t^), 

i»l, . . .  .  Thus  a  straightforward  diagonalization  procedure  results  in 
a  sequence  of  estimates  which  converges  to  m[x(t),  t £ T]  in  L^(p).  This 

result  is  summarized  in  the  following  theorem. 

Theorem  3:  Let  Y  be  an  integrable  random  variable  and  let  (X(t),  t £ T) 
be  a  random  process.  Let  {g^,  n-1,2,...}  be  a  sequence  of  functions 

where  g  is  an  Invertible  Borel  measurable  function  from  Rn  to  a  bounded 

subset  of  Rn  .  Assume  that  for  all  positive  Integers  n  and  for  all  sets 


of  n  points  in  T,  say  tj,  ....  tn,  the  quantities 


E{m^(gn[x(t.) . X(t  )])} 

jd  n  1  n 

and 


E{Vmjd(gn[x(ti),  ....  X(tn)])} 

for  d  “  0,  1,  2,  ...  and  j  =  1,  . ..,  C(n,d)  are  known.  Then  up  to  p 
equivalence,  there  is  only  one  possible  regression  functional  m[x(t),  t£T] 
-  E{Y|x(t)  -  x(t),  t  £  T}. 


ACKNOWLEDGEMENT 

This  research  was  supported  by  the  Air  Force  Office  of  Scientific 
Research,  Air  Force  Systems  Command,  USAF,  under  Grants  AFOSR-76-3062 
and  AFOSR- 78-360 5 . 

REFERENCES 


1.  J.  L.  Doob,  Stochastic  Processes.  Wiley,  New  York,  1953. 

2.  A.  H.  Nuttall,  "Theory  and  Application  of  the  Separable  Class  of 
Random  Processes,"  Technical  Report  343,  Research  Laboratory  of 
Electronics,  Massachusetts  Institute  of  Technology,  May  26,  1958. 

3.  D.  K.  McGraw  and  J.  F.  Wagner,  "Elliptically  Symmetric  Distributions," 
IEEE  Trans.  Inform.  Th ■ ,  Vol.  IT-14,  pp.  110-120,  January  1968. 

4.  L.  Brelman,  Probability ,  p.  79,  Addison-Wesley ,  Reading,  Mass.,  1968. 

5.  W.  Rudin,  Real  and  Complex  Analysis,  p.  71,  McGraw-Hill,  New  York, 
1974. 

6.  N.  Dunford  and  J.  T.  Schwartz,  Linear  Operators  Part  I :  General 
Theory,  Interscience,  New  York,  1957. 

7.  M.  Loeve,  Probability  Theory,  p.  164,  Van  Nostrand,  New  York,  1963. 

8.  J.  Dieudonne',  Foundations  of  Modern  Analysis,  p.  139,  Academic 
Press,  New  York,  1969. 

9.  R.  W.  Brockett,  "Lie  Algebras  and  Lie  Groups  in  Control  Theory"  in 
Geometric  Methods  in  System  Theory,  D.  Q.  Mayne  and  R.  W.  Brockett, 
eds.,  Reldel,  The  Netherlands,  1973,  pp.  43-82. 

10.  I.  I.  Gihman  and  A.  V.  Skorohod,  The  Theory  of  Stochastic  Processes  I, 
p.  34,  Springer-Verlag,  New  York,  1974. 

11.  P.  Billingsley,  Probability  and  Measure,  p.  433,  Wiley,  New  York, 

1979. 


IHE  IRANSAtTIONS  ON  INHJRMATION  THEORY,  VOL.  IT-23.  NO  3.  SkFTLMRER  1979  317 

Quantization  Schemes  for  Bivariate 
Gaussian  Random  Variables 

JAMES  A.  BUCKLEW  ani>  NEAL  C.  GALLAGHER,  JR.,  member,  ieee 


Abstract — The  problem  of  flfain  two  itimenaional  Cnwiai  mdoa 
variables  b  coasldered.  It  is  shows  that,  for  ail  but  ■  finite  Dumber  of 
cases,  a  polar  rtprcetautioa  gives  a  smaller  aseao  square  quaatiutioa 
error  than  a  Cartesian  re  preset  at  km.  Applications  of  the  results  to  s 
transform  coding  scheme  knows  as  ipectnl  phase  coding  are  itisnimnl 

I.  Introduction 


II.  Development 

Consider  the  mean  square  quantization  error  Ef  of  a 
polar  format  representation: 

N,  .V, 

2  2  f  f  \rexP(j0)  ~  ^,exp(y^)|2 

J- I  /a,.  , 


CONSIDER  a  two-dimensional  Gaussian  random 
variable  X  with  independent  components.  For  many 
applications  in  signal  processing  and  digital  communica¬ 
tions  it  is  necessary  to  represent  this  quantity  by  a  finite 
set  of  values.  One  possible  representation  of  X  is  in 
Cartesian  coordinates,  obtained  by  individually  quantiz¬ 
ing  the  two  rectangular  components  of  X.  An  alternative 
representation,  in  polar  coordinates,  is  obtained  by  quan¬ 
tizing  the  magnitude  and  phase  angle  of  X. 

In  (l|  experimental  data  are  put  forward  to  show  that, 
in  all  of  the  cases  treated,  polar  formatting  is  better  than 
rectangular.  The  purpose  of  this  paper  is  to  give  a  more 
rigorous  treatment  of  the  problem  and  to  ascertain  which 
of  the  representations  leads  to  a  smaller  mean  square 
quantization  error. 

In  the  first  section  we  will  derive  the  exact  error  expres¬ 
sion  for  the  polar  format.  The  second  and  third  sections 
deal  with  computer  simulations  of  the  expression  and 
compare  the  polar  and  rectangular  formats.  It  is  shown 
that,  in  almost  all  cases,  the  polar  format  gives  a  smaller 
quantization  error. 

If  the  polar  format  is  to  be  used,  the  question  arises  as 
to  the  best  ratio  of  the  number  of  phase  quantizer  levels  to 
the  number  of  magnitude  quantizer  levels.  Pearlman  [2] 
used  distortion  rate  theory  to  derive  a  bound  for  this 
expression.  In  the  fourth  section  we  derive  an  asymptotic 
expression  that  agrees  with  the  Pearlman  result  and  per¬ 
form  computer  simulations  showing  the  validity  of  this 
bound. 

In  the  fifth  section  we  apply  the  above  results  to  a 
transform  coding  scheme,  spectral  phase  coding  (SPC). 
Theoretical  arguments  are  given  for  the  observed  robust¬ 
ness  of  SPC,  and  an  exact  error  expression  is  derived. 
Computer  simulations  are  then  made  demonstrating  the 
robustness  of  SPC. 


JM)Jrd0 

2'  *  » 

7T 

where  N t  and  A,  are  the  number  of  levels  in  the  phase 
and  magnitude  quantizers,  respectively.  The  />,  and  dl  are 
the  output  levels  of  the  magnitude  and  phase  quantizers 
corresponding  to  input  levels  lying  in  the  intervals 
(a,., .a,)  and  (c,_,.ry],  respectively.  The  function  /,(/•»  is 
the  input  density  of  the  magnitude  which  is  Rayleigh 
distributed  and  independent  of  the  random  phase  0  which 
is  uniformly  distributed  over  [  -  ir.ir). 

After  squaring  out  the  integrand  and  integrating  over  0 
from  Cj_ ,  to  c/%  we  obtain 


N,  N, 


2  2  /  [(c,-c/-l)[rJ  +  6)J]-2r*1[sin(c(-<//) 

jm\  t-l 


,-01*2?.  in 


lit 


Setting  dEp/i)d/-0  leads  to  the  equations 


1 

-A 

1 

i 

j*. 

i 

V-  * 

(3a) 

2ir 

C'-’"  N,' 

(3b) 

fory»  I,-  ■  ■  ,Nt.  It  should  be  noted  that  these  are  simply 
the  equations  for  a  uniform  quantizer.  Consequently,  the 
expression  for  mean  square  error  becomes 

N, 

2  [r2  +  b*-2rb,stnc(\/Nt)]f(r)dr,  (4) 

i-  I  •'a,.  | 

where  sinc(  )-sinir(  )/ir(  ).  A  differentiation  with  re¬ 
spect  to  b,  yields  the  optimum  b,  as 


Manuscript  received  November  18,  1977;  revised  December  18,  1978. 
This  work  was  supported  by  the  Air  Force  Office  of  Scientific  Research. 
Air  System  Command.  USAF.  under  gram  AFOSR-78-.KiOS 
The  authors  are  with  ihe  School  of  Electrical  Engineering,  Purdue 
University.  West  Lafayette.  IN  47907, 


/  rf[r)dr 

b,  ■  sinc(  I / Nt)  — ^ - .  (5) 

f  }(r)dr 

1 


00 1 8-9448 /79/0900-0537S00.75  «*>I979  IEEE 


53* 


tee*  TRANSACTIONS  ON  INFORMATION  THEORY,  VOL.  IT-25,  NO.  5,  SEPTEMBER  1979 


Substituting  this  value  back  into  (4),  we  find 

_  K  [  P  rf(r)drf 

E,“  '2  ~  2  sincJ(l (6) 

f  J(r)dr 

where  the  upper  bar  indicates  the  statistical  expectation 
operator.  Let  E(N,,r)  denote  the  mean  square  quantiza¬ 
tion  error  produced  by  an  optimal,  one-dimensional,  N, 
output  level.  Rayleigh  quantizer.  It  is  shown  in  [3]  that 
E(Nr,r)  is  given  by  the  difference  be* ween  the  variance  of 
the  quantizer  input  and  the  variance  of  the  output.  Hence 
E(Nr.r)  may  be  written  as 


r‘ 


-i  *;■[/''. 


Hr)  dr 


where  the  {a/}  are  the  quantizer  input  interval  endpoints 
and  the  {&,'}  are  the  quantizer  output  levels.  Max  [4} 
shows  that  the  {/>/}  and  { a '}  satisfy 

(8a) 

f"‘  rf(r)dr 

b‘-~WT TT-  <8b> 

/  Hr)  dr 

Ja\- 1 

These  equations  may  be  written  as 

rf(r)dr  ja"rf(r)dr 

“7^ - +  -7^n - •  <’> 

2  1  f(r)dr  2  f  f{r)dr 

J  a,  ,  J  a\ 

Minimizing  (4)  with  respect  to  the  a,  yields 
b:  +fc,+  | 

a'"  2sinc(l /N,)'  (,0) 

and  substituting  (5)  into  the  above  gives 

ra>  rai*  i 

J  rf(r)  f  rf{r)dr 

J  a,  ,  J  a, 

a,m  —pr -  +  -77T7T - .  (ID 

2/  f(r)dr  2  j  f(r)dr 

- 1 

which  is  identical  to  (9).  Fleisher  (5]  shows  that  Max’s 
conditions  (i.e.,  (8a)  and  (8b))  are  necessary  and  sufficient 
for  the  optimality  of  the  Rayleigh  quantizer.  Thus  we  are 
assured  that  the  solutions  to  (I  I)  are  unique,  leading  us  to 
the  conclusion  that 


The  polar  format  error  expression  then  becomes 


E, - sincJ( \/N,)E(N„r)  +  {\- sincJ( I//V,))  r2 .  (12) 

If  we  assume  bit  rate  limited  signal  transmission,  then 
we  must  constrain  the  product  of  S,  and  N,  to  be  less 
than  or  equal  to  some  constant,  let  us  say  /V.  To  compare 
the  rectangular  and  polar  formats,  it  is  assumed  that  the 
product  of  N,  and  Ny.  the  number  of  output  levels  of  the 
rectangular  format  quantizers,  must  also  equal  N.  By  use 
of  symmetry  arguments  it  may  be  shown  that,  for  optimal 


rectangular  format  operation,  Nx  must  equal  Nv.  There¬ 
fore, 

Nx  =  Ny-N'/2.  (13) 

Let  E(S,,g)  denote  the  mean  square  quantization  error 
produced  by  an  optimal  Nt  output  level  Gaussian  quan¬ 
tizer.  The  rectangular  format  error  £IKl  is  given  by 

Enct-2E(N,.g)-2E{VN,g).  (14) 

The  problem  is  now  to  compare  (12)  with  (14). 

III.  Exact  Computer  Simulation: 

In  this  section  we  make  use  of  Max’s  (4)  tabulated 
results  for  E(Nx.g).  Max  gives  values  of  this  function  from 
Nt  **  1  to  A/,  «=  36.  We  duplicate  Max’s  work  for  the 
Rayleigh  quantizer  and  obtain  values  for  £(/V,.r).  Using 
an  exhaustive  search,  we  compute  the  smallest  values  of 
error  obtainable  for  (12)  and  (14)  for  values  of  N  from  I 
to  2000.  For  all  of  these  cases,  there  are  only  31  values  of 
N  for  which  the  rectangular  format  is  better.  These  values 


TABLE  I 

Values  of  N  where  Rectangular  Format  Is  Superior  to 
Polar  Format 


Based  upon  e**ct  eaprcislons 

•  «*«d  upon  approximate  expression 

1.  2.  ).  *•. 

* 

6 

3 

n 

0 

3 

12 

12 

i) 

1).  15 

16 

16 

17 

1  7 

20 

20 

21 

21 

?5 

24.  25 

26 

?6,  :7,  38,  31,  30.  31  .  37 

35 

35 

36 

J7 

3’ 

)8  ! 

38 

'♦2 

42 

43 

4},  44,  4B 

<•9 

40 

so 

so 

51 

Si 

56 

56 

57 

57 

58 

58 

50 

50 

63 

63 

64 

64.  65,  66,  67 

72 

72 

73 

73 

75 

76.  81 ,  83.  81.  *||.  11 

100 

101 

110,  111.  H7,  113 

RUUKLEW  AND  GALLAGHER  QUANTIZATION  SCHEMES  KJR  GAUSSIAN  RANDOM  VARIABLES 


539 


TABLE  11 

A  Tabulation  of  the  Relaiive  Eeeiuieniy  n  •  ( t>  -  £,)/ Ef  of  Polar  Quantization  over  that  of 
Rectangular  Quantizaiiun.  i me  Besi  Number  of  Magnitude  Levels  N,,  and  the 
Best  Number  of  Rectangular  Formai  Levels  as  a  Function  of  N 


N 

ti 

n 

f 

N 

h 

h 

r 

N 

N 

* 

r 

* 

r 

* 

1 

.000 

i 

1 

51 

3.801 

4 

7 

101 

.578 

5 

10 

2 

-.001 

i 

1 

52 

-3.544 

4 

7 

102 

-4.424 

6 

10 

3 

-28.572 

i 

1 

51 

-3.544 

4 

7 

103 

-4.424 

6 

10 

4 

-.005 

i 

2 

54 

-.990 

4 

6 

104 

-4.424 

6 

10 

5 

-16.22ft 

1 

c 

55 

-3.459 

5 

6 

105 

-4.424 

6 

10 

6 

2.468 

i 

2 

56 

1.613 

4 

7 

106 

-4.424 

6 

10 

7 

-4.084 

1 

2 

5? 

1 .613 

4 

7 

107 

-4.424 

6 

10 

8 

1.325 

2 

2 

58 

1 .613 

4 

7 

108 

-6.469 

6 

9 

9 

22.679 

1 

3 

59 

1 .613 

4 

7 

109 

-6.469 

6 

9 

10 

-.703 

2 

5 

60 

-5.316 

5 

7 

11U 

-1.554 

6 

10 

11 

-.703 

2 

i 

61 

-5.116 

5 

7 

111 

-1.554 

6 

10 

.620 

2 

3 

62 

-5.316 

5 

7 

112 

-1.554 

6 

10 

11 

.620 

2 

5 

63 

3.655 

5 

7 

113 

-1.554 

6 

10 

u 

-15.048 

2 

3 

64 

4.369 

4 

8 

114 

-6.812 

6 

10 

15 

-1.004 

2 

3 

65 

-1.544 

5 

8 

115 

-6.812 

6 

10 

16 

*.936 

2 

4 

66 

-1 .544 

5 

8 

116 

-6.812 

6 

10 

17 

1.936 

2 

4 

6/ 

-1.544 

5 

8 

117 

-6.204 

6 

9 

18 

-6.640 

•> 

4 

68 

-1.544 

5 

8 

118 

-6.204 

6 

9 

19 

-6.640 

2 

4 

69 

-1.544 

5 

8 

119 

-8.422 

7 

9 

20 

4.3?? 

2 

4 

/a 

-6.538 

5 

7 

120 

-4.120 

6 

10 

21 

1.220 

3 

4 

71 

-6.538 

5 

7 

121 

-1.919 

6 

11 

22 

-.660 

2 

4 

72 

.689 

5 

8 

122 

-1.919 

6 

11 

2 1 

-.660 

2 

4 

73 

.  689 

5 

8 

123 

-1.919 

6 

11 

24 

-2.631 

3 

4 

74 

.689 

5 

8 

124 

-1.919 

6 

11 

25 

6.493 

3 

5 

75 

-6.442 

5 

8 

125 

-1.919 

6 

11 

2ft 

6.495 

3 

5 

7  6 

-6.442 

5 

8 

126 

-6.151 

6 

11 

27 

-5.911 

3 

5 

77 

-6.442 

5 

8 

1 27 

-6.151 

6 

11 

J 

-5.911 

5 

78 

-6.442 

5 

8 

128 

-6.151 

6 

11 

<?v 

-5.911 

3 

•, 

7V 

-6.442 

5 

8 

129 

-6.151 

6 

11 

30 

-1 .022 

3 

80 

-4.180 

5 

8 

130 

-2.146 

6 

10 

51 

-1.022 

3 

5 

81 

-.972 

5 

9 

131 

-2.146 

6 

10 

12 

-1.022 

3 

5 

82 

-.9/2 

5 

9 

132 

-1.845 

6 

11 

15 

-9.643 

3 

5 

83 

-.971 

5 

9 

133 

-4.015 

7 

11 

14 

-9.643 

3 

5 

84 

-2.232 

6 

9 

154 

-4.015 

7 

11 

15 

1.471 

3 

5 

85 

-6.499 

5 

9 

135 

-4.015 

7 

11 

56 

1.388 

3 

6 

86 

-6.499 

5 

9 

136 

-4.015 

7 

11 

S? 

1 .  588 

3 

6 

87 

-6.499 

5 

9 

13? 

-4.015 

7 

11 

58 

1.388 

3 

6 

88 

-2.789 

5 

8 

138 

-5.298 

6 

11 

IV 

-4.288 

3 

6 

89 

-2.789 

5 

8 

139 

-5.298 

4 

11 

40 

-5.440 

4 

5 

90 

-1.765 

5 

9 

H0 

-8.395 

7 

10 

41 

-3.440 

4 

5 

91 

-1.765 

5 

9 

HI 

-8.395 

7 

to 

42 

5.885 

3 

6 

92 

-1.745 

5 

9 

142 

-8.395 

7 

10 

45 

3.885 

3 

6 

93 

-1.765 

5 

9 

H5 

-2.599 

7 

11 

44 

-2.196 

4 

6 

94 

-1.765 

5 

9 

144 

-.250 

7 

12 

45 

-2.196 

4 

6 

95 

-6.090 

5 

9 

145 

-.750 

7 

12 

46 

-2.196 

4 

6 

96 

-8.522 

6 

9 

146 

-.750 

7 

12 

4? 

-2.196 

4 

6 

97 

-8.522 

6 

9 

147 

-5.440 

7 

12 

48 

-1.140 

4 

6 

98 

-8.522 

6 

9 

H8 

-5.440 

7 

12 

49 

3.801 

4 

7 

99 

-.593 

6 

9 

149 

-5.660 

7 

12 

50 

3.801 

4 

7 

100 

.578 

5 

10 

150 

-5.660 

7 

12 

for  A/  correspond  in  general  to  regions  where  /V  is  a 
perfect  square.  Apparently,  for  values  of  A  greater  than 
101 ,  polar  formatting  is  always  the  better  of  the  two 
methods.  The  left  column  of  Table  I  contains  a  listing  of 
the  31  values  of  N  for  which  rectangular  format  gives 
smaller  error.  Table  II  gives  an  indication  of  the  relative 
efficiency  of  polar  and  rectangular  formatting  by  tabulat¬ 
ing  (Ef  -  £,)/ Ep  for  values  of  A  from  I  to  150  Also  in 
Table  II  may  be  found  the  best  number  of  magnitude 
levels  N,  (with  A,  greatest  integer  less  than  N /  Nr)  and 
the  best  number  of  rectangular  format  levels  Nx  (with 
/Vv*  greatest  integer  less  than  N / Nt)  for  each  value  of  N 
from  I  to  150.  For  values  of  N  larger  than  2000.  we  may 
make  use  of  approximation  methods. 

IV.  Approximaie  Computer  Simula i ion 

Wood  (6J  describes  a  technique  whereby  one  can  ap¬ 
proximate  the  mean  square  error  of  an  optimal  quantizer 
for  large  A.  He  then  gives  an  expression  for  the  error  of 


an  N  level  Gaussian  quantizer  which  agrees  to  within 
about  one  percent  with  the  actual  computed  mean  square 
error  given  by  Max  [4j.  This  error  expression  is 

-2-:73-N*°2  .  (15) 

(JV.  +  0.853)3 


Using  Wood's  approximations,  we  obtain  for  the 
Rayleigh  density  a  similar  error  expression  which  also 
agrees  well  with  the  actual  computed  error.  This  error 
expression  is 


£(/Vf,r)- 


0.9287 Nro2 
(0.596 +  /V,)5 


(16) 


By  use  of  these  approximate  error  expressions,  we  again 
find  the  values  of  N  where  rectangular  format  gives 
smaller  error  than  polar  format.  Computer  simulations  are 
run  up  to  a  value  of  A  —  10s.  We  find  that  for  values  of  N 
greater  than  1 13,  polar  format  is  always  better. 

Table  I  summarizes  the  results  of  the  last  two  sections. 
In  the  first  column  we  find  the  values  of  N  for  which  the 


540 


IEEE  TRANSACTIONS  ON  INFORMATION  THEORY,  VOL.  IT-25.  NO.  5,  SEPTEMBER  1979 


Fig.  I.  Rtlio  of  optimum  number  of  phase  quantizer  levels  lo  magnitude  quantizer  levels  as  a  function  of  S. 

\ 


Cartesian  format  error  is  smaller  than  the  polar  format 
error  according  to  the  exact  error  expressions.  In  the 
second  column  we  find  the  values  of  N  for  which  rectan¬ 
gular  format  is  better  than  polar  format  according  to  the 
approximate  error  expressions.  It  can  be  seen  that,  in 
general,  the  approximate  expressions  are  more  pessimistic 
than  the  exact  quantities. 


V.  Magnituoe-Phase  Information  Comparison 


An  interesting  problem  that  arises  in  using  the  polar 
representation  is  lo  find  the  best  choice  for  the  ratio  of 
phase  quantization  levels  to  magnitude  quantization 
levels.  Pearlman  (2|  used  distortion  rate  theory  to  obtain 
the  ratio  A(#/A(,— 2.596.  We  now  give  a  somewhat  diffe¬ 
rent  derivation. 

We  minimize  (12).  assuming  N  is  large.  We  note  that 


Using  these  approximations,  and  (16)  together  with  (12). 
we  obtain 


» ( I  - ( ir/ n, >73) ~ ;r;; 7  —  ,  +  (19) 


<0.5965+ Af,)’ 


Assuming  (0.5965  +  /V,)’*s  \,3.  we  substitute  A/,-  N /  N„ 


into  (19),  differentiate  with  respect  to  and  set  the 
resulting  expression  equal  to  zero.  Solving  for  V„,  we  find 

)V#-  1.63 N 1/2  (20a) 

or 


N, 


-2.662. 


(20b) 


which  agrees  closely  with  the  Pearlman  bound.  Fig.  I 
shows  a  computer  plot  of  the  actual  ratio  plotted  as  a 
function  of  A/.  The  dotted  line  is  the  value  2.662.  Using 
this  value  in  (19).  it  is  a  simple  matter  to  show  that,  for 
large  N.  the  polar  format  error  is  smaller  than  the  rectan¬ 
gular  format  error. 


VI.  Applications  to  Spectral  Phase.  Coijino 

From  the  preceeding  sections,  we  know  that  if  .V33  bits 
or  more  per  sample  is  to  be  used  to  quantize  a  white 
Gaussian  sequence,  it  is  better  to  pair  the  members  of  the 
sequence  and  quantize  them  in  a  polar  format  rather  than 
simply  quantizing  the  samples  individually.  We  also  know 
that  the  phase  information  is  much  more  important  than 
the  magnitude  information  for  minimizing  the  mean 
square  quantization  error. 

Spectral  p  lase  coding  (SPC)  (l),  (7j  is  one  way  in  which 
we  may  make  use  of  the  above  two  properties.  Consider 
some  arbitrary  data  sequence  c0,.r,.-  ■■■<(,.  where  ;n  out 
examples  we  let  /.  —  4096.  The  message  sequence  is  di¬ 
vided  into  blocks  of  \  samples;  we  consider  the  case 


WCUJW  AND  OALLAOHUt:  QUANTIZATION  1CHCMBS  FC*  GAUSSIAN  RANDOM  VAIUASLM 


541 


N  -  32.  Each  block  of  N  terms  is  then  divided  in  half,  with 
the  first  N /2  terms  forming  the  sequence  1  and 

the  second  group  of  N/2  terms  forming  (a2.*}«-cf  '•  The 
complex-valued  sequence  (a„}*£o' 1  is  formed  from 

(2J) 

We  then  form  the  spectral  sequence  {Ap  exp<<0,)}  from 
Ar  exp(  iBf )  -  2  a.  exP<  ~  Mnp/ N ), 

n- 0 

p-0.  -  ,y-l.  (22) 

The  SPC  sequence  is  described  by  the  following 

equations: 

2  </V/2>- i 

a.  -  *  2  y  [  «pH)  + «xP('>f>  ♦ ,  N/2>)  ] 

pmO  * 

■  cxp(  iAirnp  /  W),  n-0,--,y-l,  (23) 

where 

S- max  [A,),  (24a) 

V>+(N/ (24c) 

and  *, -cos'  '(^/.S).  Equation  (24)  describes  the  coding 
procedure  and  (23)  the  decoding  procedure. 

SPC  is  essentially  a  polar  format  representation  of  the 
discrete  Fourier  transform  (DFT)  of  a  random  phase  time 
series.  In  [8]  the  conditions  under  which  the  real  and 
imaginary  parts  of  the  samples  from  the  DFT  tend  to 
independent  normal  random  variables  are  discussed.  This 
is  an  asymptotic  result,  and  it  tells  us  that  the  magnitude 
of  the  DFT  is  Rayleigh  and  independent  of  the  uniformly 
distributed  phase.  The  uniform  ( -  w.w)  distribution  of  the 
phase  makes  it  a  simple  matter  to  quantize  this  quantity  in 
an  optimum  fashion.  Because  of  the  relatively  high  phase 
information  content,  this  case  of  quantization  is  im¬ 
portant.  Indeed,  as  is  shown  in  Section  IV,  as  long  as  the 
phase  is  optimally  quantized,  the  quantizer  characteristics 
for  the  magnitude  component  are  much  less  important.  In 
addition  to  the  uniform  phase  properly  for  the  asymptotic 
case,  we  can  show  that  in  some  special  cases  the  phase  has 
this  property  for  small  as  welt  as  large  N. 

Consider  (22).  We  assume  a„  can  be  represented  as 
r„exp(i0„)  where  0„  is  uniform  and  independent  of  r„  for 
all  9,,  i¥>n.  Under  these  assumptions,  we  have  the  follow¬ 
ing  theorem. 

Theorem;  Ap  is  independent  of  9p,  and  0p  is  uniformly 
distributed  for  any  arbitrary  block  size  N. 

Pr°°f  <W/2)-l 


2  'A  cos** 

(25a) 

A- 0 

(At/2)—  1 

Im{^,}-  2  f*  sin**, 

A-0 

(25b) 

where 

*k-»k-jfkp.  (25c) 

Consider  the  joint  characteristic  function  of  these  two 
random  variables: 


**(«..  *  Er,  £*  { e*pO(«i  Re{/f,}+u2Im{/t,)))) 


exp(,j 


(At/2)- I 

fa) I  2  rk  cos** 
A-0 


(N/D-t 

+  2  r*sin** 

*•0 


(At/2)  —  I 


•  [w,cos**+fa>2iin**]  j  •  •</*<w/2) 

•  cos|**  +  tan  " 1  ^  jj  </*,•••  </*, 

(  (At/2)—  I  x 


•  At/2)-  I 


(26) 


where  E,t  and  are  the  expectation  operators  over  the 
subscripted  random  variables.  However,  this  is  circularly 
symmetric.  Using  the  properties  of  the  two-dimensional 
Fourier  transform,  we  know  that  the  bivariate  density 
must  also  be  circularly  symmetric.  However,  this  can 
happen  if  and  only  if  the  magnitude  is  independent  of  the 
phase  and  (he  phase  is  uniformly  distributed  over  a  region 
of  support  2ir. 

This  theorem  tells  us  that  with  the  given  assumptions, 
we  can  guarantee  that  the  optimal  transform  phase  quan¬ 
tizer  is  the  uniform  quantizer.  In  many  cases,  experimen¬ 
tal  data  indicate  that  we  are  not  far  from  the  optimum 
result  even  when  the  conditions  for  the  theorem  do  not 
hold  for  a  particular  sequence. 

We  now  derive  an  expression  for  the  quantizing  error  of 
the  SPC  representation.  The  ideal  unquantized  SPC  repre¬ 
sentation  is 

Vxp(/0,)-  f  [e*p(W>)+exp(W>+,Ar/2))]-  (27) 

To  begin  with,  we  assume  that  the  phase  terms  {*,}  are 
quantized  to  M  equal  step  size  quantization  levels.  From 
[2J  we  have 

e*^-  2  *inc(i»»+  1)^),  (28) 

-  qo 

where  »  the  quantized  version  of  4>p.  From  experimen¬ 
tal  results  it  is  found  that  quantization  of  the  S  parameter 
is  negligible  and  will  henceforth  be  ignored.  The  quantiza- 


542 


IEEE  TRANSACTIONS  ON  INFORMATION  THEORY,  VOI..  IT-25,  NO  5.  SEFTFMBF.lt  1971 


tion  error  E  can  now  be  expressed  as 

E-A,  exp(  i9f)  -  Ar  exp(  iOp )  (29) 

where  Apcxp(i0p)  represents  Aftxp(i0p)  using  the  quan¬ 
tized  parameter  <£p.  Using  (28)  in  (29)  we  have 

£«"y  2  sinc(m+ 1/A/) 

L  m+ 0 

[exp<i(mA/+  l)^)  +  exp(/(mA/+  l)^+(Af/J,)] 

+  ( 1  -  sinc(  1  /  M ))  j  (cxp(  fy) + exp(  4, + ( w/2)». 


(30) 

We  square  this  quantity  and  take  its  expectation,  using  the 
following  expressions  derived  in  Appendix  A: 

5 

y  2j  sinc(m+l/A/) 

1  m#0 


•  [exp(/(mA/  +  lH,)+exp(.(mA/+  l  )*„.,*,„)] 
S1 

“y  2  sinc2(/+  1/A/) 

2  i+o 

[l+cos(2*,(A//+l))], 


^j(exp(iV>)  +  exp(«/>+(/v/2)))  2  sinc(m+l/A/) 

'  m+0 


(30 


•  [exp(  —  i(mM  +  1)V>) +exp(/(mA/-t-  l)i^  +  IN/J,)]}“0, 


and 


(32) 


£  j(l  -sinc(l/A/))J  | (exp(4,)  +exp(n)> t(V/J,))|  j 


-(l-sinc(l/A/)):£{/t/},  (33) 
where  £  { • }  is  the  statistical  expectation  operator.  Then 
£,-£{(,l,2)}(l-sinc(l/A/))J 
S2 

+  y  2  *inc(/+  I/A/) 

1  i+ o 

■E{  I  +cos[2(/A/+  !)<>,]}.  (34) 

From  the  Riemann-Lebesgue  lemma  [9]  we  know  that, 
for  large  A/,  £|cos2(/A/+  U^]«  I.  Also, 

2  sincJ(/+l/A/)-l  -sinc^l/A/) 
t+o 


"(I  —  sinc(  1  /  A/ ))( I  +sinc(!/A/)) 

(35) 

*2(1  —  sinc(  I  /  A/)),  (36) 

so  that 

£,-£{^}(l-sinc(l/A/))J+S:(l-sinc(l/A/)). 


(37) 


TABLE  III 

A  Comparison  of  Normalized  Quantization  Error  for  an 
SPC  Sbouence  and  an  Optimal  Unit  Variance  Gaussian 
Quantizer  for  Different  Probability  Densities 


Drns  < [ v 

frror  •  A"  ) 

f|f0,  1) 

0.91  t-7 

? .  1 1  1-? 

OfO,  ?l 

i.Oo  c-; 

2  .  f  .7 

mo,  i 

7.00  E-? 

:. 

•  1*2  .Vi*. 

" — r  ■  t1 

n.7l  E-2 

n .  ( • : 

u(-  .77.  .77) 

i.J'i  *-? 

U(-I|.  h) 

j.so  r-; 

n.ti 

■K-5.  0) 

s.RS  £-: 

n.r>fl  r-r 

Kt?) 

5.50  !-7 

t  7  .  t.  A  t  .  ? 

Xt  1  ) 

in. 60 

»2.5S  r-z 

t'.'j) 

! 

62. 7°  I-- 

’ 2. 3n  c-.1 

This  error  expression  agrees  extremely  closely  wuh 
computer  simulations  and  with  the  error  expression  found 
in  (I,  eq.  (22))  which  is  derived  by  a  different  method.  The 
second  term  contributes  the  most  to  £,. 

We  now  present  examples  that  make  use  of  a  sequence 
of  4096  zero  mean,  unit  variance  Gaussian  random  vari¬ 
ables.  We  first  form  the  SPC  version  of  this  sequence 
allowing  four  bits  per  SPC  sample.  The  error  expression  in 
(37)  predicts  a  mean  square  error  of  2.2  x  10  1  per  sample 
The  actual  computed  average  error  per  sample  for  SPC 
block  sizes  of  32  is  2.3  x  10 -I.  An  optimal  Max  (4)  quan¬ 
tizer  would  give  an  error  of  0.91  x  10' 2  per  sample.  By 
using  SPC  we  create  only  a  little  over  twice  the  minimum 
achievable  error  for  this  signal  and  this  number  of  quanti¬ 
zation  levels.  However,  if  the  signal  statistics  change  and 
the  same  quantizers  are  employed,  what  is  the  expected 
result? 

Table  III  summarizes  a  number  of  computer  simula¬ 
tions  for  Gaussian,  double  sided  exponential,  and  uniform 
random  variables  coded  using  both  the  optimal  unit  vari¬ 
ance  Gaussian-Max  quantizer  and  SPC.  /V(0./4>  is  the 
zero  mean,  variance  of  A,  Gaussian  density;  V(-A/2. 
A/2)  is  the  zero  mean,  variance  of  A2/ 12.  uniform  den¬ 
sity;  and  X(A)  is  the  zero  mean,  variance  of  I  /A2,  double 
sided  exponential  density. 

For  this  example,  one  can  see  that  the  large  variance 
signals  have  lower  quantizing  error  if  coded  with  SPC. 
Because  the  Max -Gaussian  quantizer  has  very  small  step 
sizes  near  the  origin,  we  expect  that  it  will  produce  small 
errors  for  those  signals  that  have  a  large  amount  of 
probability  in  that  region.  The  most  striking  characteristic 
of  these  results  is  the  way  the  normalized  SPC  mean 
square  error  remains  virtually  constant  for  each  particular 
distribution.  SPC  tracks  variations  in  signal  power  very 
well 


BUCKLEW  AND  OALLAGHER:  QUANTIZATION  SCHEMES  FOR  GAUSSIAN  RANDOM  VARIABLES 


S4. 


VII.  Conclusion 

In  this  paper  we  have  investigated  in  detail  the  opti¬ 
mum  quantization  of  two-dimensional  Gaussian  random 
variables.  Results  are  put  forward  to  prove  that,  in  gen¬ 
eral,  polar  format  is  superior  to  rectangular  format.  Ap¬ 
plications  of  this  to  a  coding  scheme  (SPC)  are  studied  in 
order  to  explain  why  SPC  seems  to  exhibit  robustness  with 
respect  to  variations  in  signal  statistics  and  signal  power. 

Appendix  A 

We  will  now  derive  (31).  Taking  the  square  of  the  expression 
and  moving  the  expectation  operator  through  the  sum  leaves 

"X  2  2  sinc(m+l/M)sinc(/+l/M) 

#*i  l +0 

E{exp(iM(m- l)+p)  +  txp(iM(mpp  -  %♦<*/*>)) 

•exp(i(^-^„N/j))) 

+  exp(  -  iM(  typ  -  m+p  +  («/2,))  exp(  -  »(«f>  -  ♦,  n/2 i)) 

+  CXp(iM(m-  / )»#> ♦  < Ar/z>) } •  (Al) 

Assume  that  Ap  is  independent  of  8p.  This  means  that  the  8p  and 
<f>p  used  in  the  expressions  for  <pf  and  <£,,.(/v/Jl  are  also  indepen¬ 
dent.  Therefore,  the  expectation  in  (Al)  is  zero  except  for  those 
terms  where  l—m  Consequently,  this  expression  is  equivalent  to 

c  2 

2  sincJ(/+  1/A/) 

*  i  +  0 

E  {2  +  2cos[(MI+ ])(+,- +,.{fin))]}.  (A2) 

Because 

(A3) 


we  have 

c  2 

-y  2  «nc*(/+  \/M)[  I  +cos2^,(W/+  1)1.  (A4) 

1  i+o 

Equation  (32)  is  obtained  by  a  similar  argument.  For  (33).  we 
recognize  that 

$ 

2  (e*P<  «<>)  +  exP< 4,  ♦  ( N/JI»  “  A,  e*P(  tip )  ( A5) 

Therefore, 

£{(l-smc(l/A/))JM,exp(^)|1} 

-(l-sinc(l/M))J£{/t,2}  (A6) 


References 

{)]  N.  C.  Gallagher,  "Quatizing  achemes  for  the  discrete  Fourier  trans¬ 
form  of  a  random  time  scries,”  IEEE  Tram.  Inform  Theory,  vol. 
IT-24,  pp.  156-163,  Mar.  1978. 

[2j  W.  A.  Pearlman,  "Quantization  error  bounds  for  computer  gener¬ 
ated  holograms,”  Stanford  Univ.  Inform.  Syst.  Lab.,  Stanford.  CA, 
Tech.  Rep  #6503-1,  Aug.  1974. 

[3]  3.  A.  Bucktew  and  N.  C.  Gallagher,  ”A  note  on  optimum  quantiza¬ 
tion,"  IEEE  Traiu.  Inform.  Theory,  vol.  IT-25,  pp.  365-366.  May, 
1979 

[4]  J.  Max,  "Quantization  for  minimum  distortion .”  IRE  Trans  Inform 
Theory,  vol.  IT-6,  pp.  7-12,  Mar.  1960. 

[5]  P.  E.  Fleischer,  "Sufficient  conditions  for  achieving  minimum  dis¬ 
tortion  in  a  quantizer,”  IEEE  Ini.  Corn.  Rec..  Pari  I,  pp.  104  1 1 1, 
1964. 

[6]  R.  C.  Wood,  "On  optimum  quantization,”  IEEE  Tram  Inform 
Theory,  vol.  IT-5,  pp.  248-252,  Mar.  1969. 

[7]  N.  C.  Gallagher,  "Discrete  spectral  phase  coding,"  IEEE  Tram. 
Inform.  Theory,  vol.  IT-22,  pp.  622-624,  Sept.  1976. 

[8]  N.  C.  Gallagher  and  B.  Liu,  “Statistical  properties  of  the  Fourier 
transform  on  random  phase  diffusers.”  Opiik,  vol.  42.  pp.  65-86, 
Feb.  1975 

[9]  H.  L.  Royden,  Real  Analysis.  Toronto:  MacMillan,  1968.  pg  90. 


tp  ~  i/>»(Ar/Ji“2^,, 


MCE  TRANSACTIONS  ON  INFORMATION  THEORY,  VOL.  IT-2S.  NO.  6,  NOVEMRER  1979 


667 


Two-Dimensional  Quantization  of  Bivariate 
Circularly  Symmetric  Densities 

JAMES  A.  BUCKLEW  and  NEAL  C.  GALLAGHER,  JR.,  member,  ieee 


Abstract — TV  proMf  a I  quanrtrtng  a  two-41measioaal  raadoo)  *ari- 
■Mc  whose  bivariate  deadly  hae  circular  symmetry  Is  considered  la  detail. 
Two  qaaadaaltoa  Methods  an  coasMeiwd,  leading  10  polar  and  rectangular 
repreacatatloaa.  A  tlaiplr  aaceaaary  and  sufficient  condition  is  derived  to 
determine  which  ol  these  two  qaaatlTStior  scheaaes  la  beat.  It  polar 
qusoltxarton  la  deemed  beet,  the  qweetion  arises  as  to  tbe  ratio  of  the 
Dumber  of  phase  quantlaer  iereta  to  that  of  magnitude  quanrtier  levels 
when  the  product  of  these  aarahets  la  fixed.  A  simple  express  loo  Is  derived 
for  this  ratio  that  depeods  only  upon  the  Magnitude  distribution.  Several 
examplea  of  common  dreuiarty  syarasetric  Mvariale  deaaMes  an  worked 
out  In  detail  nring  these  exprradoaa 


I.  Introduction 

CONSIDER  a  two-dimensional  random  variable  X 
whose  bivariate  density  is  circularly  symmetric.  We 
desire  lo  represent  this  quantity  by  a  finite  set  of  values. 
One  possible  representation  of  X  leads  to  a  Cartesian 
coordinate  system  expression  wherein  we  individually 
quantize  the  two  rectangular  components  of  the  random 
variable.  Another  common  representation  leads  to  a  polar 
coordinate  representation  where  we  quantize  the  magni¬ 
tude  and  phase  angle  of  X  These  two  representations  are 
chosen  mainly  for  their  computational  feasibility  and  ease 
of  implementation.  Other  authors  have  considered  the 
general  problem  of  multidimensional  quantization.  Zador 
(1]  derives  an  expression  for  the  minimum  error  achiev¬ 
able  by  a  multidimensional  quantizer  for  an  arbitrary 
density,  but  no  insight  into  the  required  quantizer  struc¬ 
ture  is  attained.  Chen  [2]  describes  a  recursive  computer 
technique  to  solve  for  a  “good”  quantizer,  but  the  opti¬ 
mality  of  the  final  solution  is  not  assured.  By  constraining 
ourselves  to  circularly  symmetric  densities  and  also  to 
either  Cartesian  or  polar  coordinate  quantization  schemes, 
it  becomes  possible  to  reduce  the  optimal  two-dimensional 
quantization  problem  m  one  dimension.  Max  [3]  develops 
necessary  conditions  for  the  optimality  of  a  one-dimen¬ 
sional  quantizer.  Panter  and  Dite  [4J  give  a  formula  for 
the  asymptotic  error  to  be  expected  for  optimal  mean 
square  error  quantizers  (of  sufficiently  smooth  input  den¬ 
sities). 

In  Section  II  we  obtain  a  simple  criterion  by  which  to 
determine  whether  polar  format  or  rectangular  format 
gives  a  smaller  mean  square  quantization  error,  it  is 

Manutcript  received  September  12,  I97S;  revised  April  2,  1979.  Thu 
work  was  supported  by  the  Air  Force  Office  of  Scientific  Research  under 
Grant  AFOSR  7S-J605. 

The  authors  are  with  the  Department  of  Electncil  Engineering. 
Purdue  University,  West  Lafayette,  IN  47907. 


shown  that  for  some  very  important  cases,  notably  for  the 
Gaussian  bivariate  density,  the  polar  format  is  asymptoti¬ 
cally  superior. 

If  polar  format  is  to  be  used  and  the  product  N  *  NtN, 
is  fixed,  where  Nt  and  N,  are  the  number  of  phase  and 
magnitude  quantization  levels,  respectively,  the  question 
arises  as  to  the  optimum  ratio  Nt/ Nr.  We  derive  a  simple 
expression  for  this  ratio  that  depends  upon  only  the 
magnitude  density. 

In  Section  III  we  provide  several  examples  of  common 
circularly  symmetric  densities  (e.g.,  marginal  densities  arc 
Pearson  II,  Pearson  VII,  sinusoidal,  and  Gaussian),  and 
we  address  the  question  of  whether  the  rectangular  or  the 
polar  format  scheme  gives  a  smaller  quantization  error. 


II.  Development 


Consider  the  mean  square  quantization  error  E 'p  of  a 
polar  format  representation  of  the  two-dimensional  ran¬ 
dom  variable  x  -  rexp(i0]: 


2  2  f  f  K*- 

J- I  <-l  •V/a.-i 


JM)drd» 
lit 


(I) 


Implicit  use  has  been  made  of  the  fact  that  in  circularly 
symmetric  bivariate  densities  the  magnitude  random  vari¬ 
able  with  probability  density  /,(•)  is  independent  of  the 
uniformly  distributed  [-ir.irj  phase  random  variable.  The 
b,  and  <fy  are  the  output  levels  of  the  magnitude  and  phase 
quantizers  corresponding  to  input  levels  lying  in  the  inter¬ 
vals  (a, and  (cy_,,cy J,  respectively.  Integrating  over 
the  0  variable.  (I)  becomes 


2  2  /  '  [(r2 +  *?)(<, -c,-,)-2rb, 

7-1  i- I -Vi 

[sin(cy-dy)-sin(cy  ,-dy)]]^dr.  (2) 


It  is  shown  in  (5)  that  the  optimal  phase  quantizer  is  the 
uniform  quantizer.  This  means  that  cy - ry_t>-2tr/ and 
Cj-dj «  —  (cy_ , - dj)  —  v/Nt,  for y  —  1,-  •  •  ,Nt.  This  allows 
us  to  simplify  (2): 


r1  +  bf  -  2  rb, 


N, 


A')  dr. 


(3) 


Differentiating  with  respect  to  bt,  we  find  the  optimum  b, 


0018-9448/79/1 100-0667$00.75  ©1979  IEEE 


668 


iecc  transactions  on  information  theory,  vol.  rr-25,  no.  6,  November  1979 


C 

n0  '*-1 


Consequently,  (3)  becomes 

sin-^-  2  N- 

EP~Eirl) - TT~  2  (b;f  T  f(r)dr.  (9) 

—r  <-t  •'«.  i 


Our  problem  is  now  one  of  characterizing  the  quantity 
£*.  Panter  and  Dite  (4)  give  a  formula  for  the  expected 
error  of  a  minimum  mean  square  error  quantizer  with  a 
large  number  of  output  levels  and  a  smooth  input  density. 
This  formula  is 


The  equation  given  by  Max  for  the  output  levels  b'  of  an 
optimal  one-dimensional  magnitude  quantizer  is  found  in 
[3]  to  be 

f*  r/(r)dr 

■  (5) 

/  f(r)dr 

where  the  optimal  input  interval  endpoints  a\  (for  the 
one-dimensional  case)  satisfy 

b‘^2—  •  (6) 

If  we  minimize  (3)  with  respect  to  the  a,,  we  arrive  at 

the  necessary  condition  (for  the  two-dimensional  case) 

.  *.+».„  . 

,  —  i - 2 - <7> 


This  equation  indicates  that  the  quantizer  interval  end¬ 
points  for  the  optimum  magnitude  quantizer  in  the  two- 
dimensional  case  is  the  same  as  the  quantizer  interval 
endpoints  for  the  optimum  one-dimensional  quantizer. 
From  (4)  and  (5)  and  the  preceding  discussion,  we  have 
the  following  relationship  between  the  output  levels  b[  and 


Roe  [7]  also  derives  some  asymptotic  formulas  which  were 
later  used  by  Wood  [8]  to  rederive  (11).  Roe’s  formulas 
depend  on  the  truncation  of  a  Taylor  series  expansion  of 
the  input  density.  Wood,  in  his  formula,  explicitly  states 
that  the  input  density  and  the  first  few  derivates  (up  to 
order  five  in  some  cases)  must  exist  and  be  continuous. 
Panter  and  Dite  require  that,  as  the  input  intervals  be¬ 
come  very  small,  the  density  function  may  be  approxi¬ 
mated  as  a  constant  over  each  interval.  In  (1]  it  is  shown 
that  a  sufficient  condition  for  (11)  to  hold  is  that  f(x)  be 
Riemann  integrable,  a  much  less  severe  restriction  then 
continuity  or  differentiability. 

We  make  use  of  the  approximation 


/  sinx\J  ,  x2 

(— J-'-y 


where  E(  ■ )  is  the  statistical  expectation  operator.  In  (6)  it 
is  shown  that  the  mean  square  quantization  error  for  a 
minimum  mean  square  error  quantizer  is  simply  the  input 
mean  square  value  minus  the  output  mean  square  value.  If 
we  denote  by  £*  the  mean  square  quantization  error 
produced  by  an  optimal  N  level  quantizer  for  the  random 
variable  X,  we  may  rewrite  (9)  as 


and  of  (1 1)  in  order  to  reduce  (10)  to 

£ (13) 
\  3N#2  J  N2  3  N2 

where  we  assume  £(r2}  — 2  (this  implies  unit  variance 
rectangular  marginal  densities).  If  we  let  N  be  the  total 
number  of  output  levels  allowed  to  represent  the  two 
dimensional  random  variable  X,  we  have  the  relation. 

N-N,Nt.  (14) 

Since  K,  >0,  it  is  simple  to  show  that  0(NK/1)  and 
Nt~  0(N,/2)  by  differentiating  (13)  and  solving  for  the 
optimal  quantities.  Making  use  of  this  fact  and  (14),  we 
may,  assuming  sufficiently  large  N,  write  (13)  as 

„  K,N }  2  v2 

-.  (15) 


This  is  then  optimized  with  respect  to  Nt  and  yields  the 
optimal  \#2  as 


I  2  (h.y  /  f,fr)Jr+  I- 


£{r2}- 


I  Mn  v 

!  Ny 


■UCKLEW  AND  OALLAGHE*  QUANTIZATION  OF  •IVARIATE  tlRCULAJU-Y  SYMMETRIC  DENSITIES 


669 


This  leads  to  the  following  expression  for  the  minimal 
attainable  asymptotic  polar  format  error: 


of  (16);  we  find 


-fa*,  2v 

V  3  N 


Now  consider  the  problem  of  optimally  quantizing  the 
random  variable  X  in  a  rectangular  format  The  mean 
square  quantization  error  E,  of  this  representation  is  given 
by 


2  2  /*  /'■  [o-Jtl'+tr- *,)’ 


,m  |  jm  \  t  e,  I 


■f,Jx<y)dxdy,  (18) 

where  Nx  and  Ny  are  the  number  of  levels  in  each  of  the 
respective  orthogonal  random  variables.  The  other  nota¬ 
tion  should  be  clear.  Equation  (18)  may  be  written  as 


r'  (*-/,)V»d* 


2*.  2 n 

N  '  V  3  /V  ' 


RL-R) -VA • 


111.  Exampi.es 

For  our  first  example  we  calculate  the  relevant  parame¬ 
ters  for  a  random  variable  whose  marginal  density  is  of 
Pearson  type  VII.  This  distribution  is  a  generalization  of 
Student’s  /-distribution.  The  bivariate  density  is 

/(jr.y)-- - 2  — — — - — .  -oo <-*,>><  oo 


(with  v  >  1  to  assure  finite  variance)  and  the  marginal 
density  appears  as 

2*(o-  l)*T(t>  + 1/2) 

J{x)m— - - ; - -  .  —  oo  <x  <  oo 


+  2  /*'  0 >-hj)iS,(y)dy-  (19) 

where  we  make  use  of  the  fact  that  the  first  term  in  the 
bracket  in  (18)  depends  only  upon  x  and  the  second  term 
depends  only  upon  y.  By  symmetry  arguments  (since 
/.(*)“/.(■*)).  we  may  argue  that  N  t-  Ny~  N'/2.  The 
quantizer  that  minimizes  the  above  equation  is  simply  the 
minimum  mean  square  error  quantizer  for  each  of  the  two 
components.  Therefore,  again  using  (II),  we  have  for 
large  N 


where  T(-)  is  the  gamma  function  and  where  we  have 
normalized  the  distribution  so  that  f(x)  has  unit  variance. 
The  magnitude  density  is  derived  by  substituting  in  r  for 
^x2+y2  in  Jlx.y)  and  multiplying  the  result  by  2 irr.  as 
shown  by  a  simple  change  of  variable.  Equation  (24) 
yields,  after  some  tedious  algebra, 

Kr^)]' 

K~— 75 - p  1251 

where  />(•;•)  is  the  beu  function.  We  perform  similar 
operations  with  the  magnitude  density  to  yield 


o(o- 1)[ 

24—  B 


(!=¥)]• 


Comparing  (20)  and  (17),  we  say  that  polar  format  is 
asymptotically  better  than  rectangular  format  if  and  only 
if 


In  Fig.  1  Kx  (solid  line)  and  (2/C,7t/3)'/2  (dotted  line)  arc 
plotted  as  a  function  of  o  for  values  from  1.1  to  21.1.  As 
shown  by  this  graph,  the  polar  format  is  always  asymptot¬ 
ically  best  for  this  class  of  distributions.  An  interesting 
point  about  this  set  of  distributions  is  that,  in  the  limit  as 
o-»oo,  (23)  converges  to  a  unit  variance  Gaussian  density. 
Therefore,  taking  this  limit  in  (25)  and  making  use  of 
Stirling’s  approximation,  we  have 


In  other  words,  if  the  inequality  is  satisfied  and  the 
original  input  probability  density  is  Riemann  integrable, 
then  we  are  guaranteed  that  there  exists  an  N0  such  that 
for  every  N  >N0,  polar  format  quantization  will  perform 
better  than  rectangular  format  quantization. 

If  polar  quantization  is  deemed  best  for  a  particular 
density,  then  what  is  the  ratio  Nt/N,  that  provides  the 
smallest  total  error?  This  question  is  answered  by  the  use 


Wood  (8)  estimates  this  number  as  2.73  which  is  close  to 
our  derived  value.  From  (26)  we  have  similarly 

1281 

which  is  the  parameter  for  the  Rayleigh  distribution  ob¬ 
tained  in  the  limit.  Using  these  two  values  in  (21),  we 


670 


IEEE  TRANSACTIONS  ON  INFORMATION  THEORY,  VOL.  IT-25,  NO.  6,  NOVEMBER  1979 


1.1  5.1  J.1  13.1  17.1  21.1  0  j  S  t  I  10  *  - 

Fif.  I.  Solid  line  u  «  plot  ol  KM  u  l  (unction  ol  V ,  and  dotted  line  is  a  Fif.  2.  Solid  line  it  a  plot  o(  K,  as  a  (unction  of  V ,  and  doited  line  is  a 
plot  o l  as  (unction  ol  V  tor  Pearson  VII  density.  plot  of  (2K,w/3)l/2  as  function  of  V  lor  the  Pearson  II  density. 


conclude  that  asymptotically  polar  formatting  is  better 
than  rectangular  formatting  for  Gaussian  bivariate  densi¬ 
ties.  As  a  matter  of  interest,  when  we  substitute  the  value 
of  K,  found  in  (28)  into  (22),  we  find  the  optimal  ratio 
N$/N,  to  be  2.659.  Pearlman  (9)  using  distortion  rate 
theory  states  that  this  ratio  should  be  >  2.596,  which  is  in 
agreement  with  our  result. 

For  the  next  example,  consider  distributions  of  the 
Pearson  II  class.  The  bivariate  density  is 

f(X  y)m 

•i/(2(«+l)-(xJ+yJ)),  (29) 

where  v  >0,  and  (/(■)  is  the  unit  step  function.  The 
marginal  density  is 

f(x)m  r(CFl)(2(H-l)-rT'(l/2)(/(2(ti+l)-ri) 
2°(c+l)v\/w  r(t>+ 

(30) 

For  o—  j  we  find  that  /(x)  has  a  uniform  distribution.  For 
i  —  1 .  we  have  (hat  the  bivariate  density  is  uniform  over  a 
circular  region  in  the  plane.  Using  (30),  we  find 


From  the  magnitude  density  we  derive  that 


In  Fig.  2  can  be  seen  a  plot  of  A',  (solid  line)  and 
<2A,ff/3)l/1  (dotted  line)  as  a  function  of  t  for  values 
from  zero  to  ten.  It  should  be  noted  that  (30)  also  con¬ 


verges  to  a  Gaussian  density  as  r-»oc.  It  is  a  simple 
matter  to  check  that  the  expressions  in  (31)  and  (32) 
indeed  approach  the  correct  limits.  From  the  plot  it  can 
be  seen  that  for  values  of  v  in  the  interval  (0.0.  0.4)  polar 
format  is  better.  In  the  interval  (0.4,  3.635)  it  is  seen  that 
rectangular  is  better,  and  from  3.635  to  infinity  polar 
again  is  better.  It  appears  then  that  for  the  circularly 
symmetric  bivariate  density  whose  marginal  density  is 
uniform,  we  have  the  interesting  result  that  rectangular 
format  is  asymptotically  better  than  polar  format. 

In  our  analysis  and  in  the  examples  considered  so  far 
we  have  constrained  the  class  of  quantizers  considered  to 
two  different  types,  the  rectangular  format  and  the  polar 
format.  In  general,  neither  of  these  schemes  will  be  opti¬ 
mal  for  an  arbitrary  two-dimensional  random  variable 
with  a  circularly  symmetric  probability  density.  Zador  [I] 
gives  an  expression  for  the  asymptotic  mean  square  error 
£,  of  the  optimal  two-dimensional  mean  square  error 
quantizer.  This  equation  is 

E.-CJN,  (33) 

where 

C‘ "  18\/3  [ L  !fx^x'y)  (34) 

For  the  Pearson  VII  density  C,  - 4.0307  v/(v  -  1),  for  the 
Pearson  II  density  C,™4.0307  v/(v+  1).  Since,  in  the  limit 
as  o  becomes  large,  both  of  these  classes  of  densities 
converge  to  the  Gaussian,  the  smallest  error  attainable  for 
a  two-dimensional  normal  random  variable  is  approxi¬ 
mately  4.0307/ A.  The  best  that  we  can  do  with  a  polar 
format  representation  is  4.95/ N  and  the  best  that  we  can 
do  with  a  Cartesian  format  representation  is  5.442/.V. 
There  is  certainly  room  for  improvement  here.  However, 
the  important  thing  to  note  is  that  the  structure  of  the 
polar  format  quantizer  is  known  while  that  of  the  theoreti¬ 
cal  optimum  quantizer  is  not. 


BUCKLEW  AND  OALLAOHER:  QUANTIZATION  OP  BIVARIATE  CIRCULARLY  SYMMETRIC  DENSITIES 


671 


In  Section  II  it  was  stated  that  a  sufficient  condition  for 
(11)  to  be  valid  is  that  the  magnitude  density  function  be 
Riemann  integrable.  For  most  density  functions  of  inter¬ 
est  in  modeling  physical  systems,  this  criterion  is  met.  One 
group  of  densities  that  does  not  meet  this  condition  is  the 
set  of  atomic  densities,  i.e.,  densities  for  which  probability 
mass  is  contained  at  a  single  point.  In  a  circularly  sym¬ 
metric  bivariate  density,  the  phase  must  be  uniformly 
distributed  [-«•,»].  The  only  quantity  that  can  be  discrete 
is  the  magnitude  distribution,  i.e.,  we  may  have  “rings"  of 
probability  mass  distributed  in  the  plane.  Suppose  we 
have  a  single  "ring"  of  probability  mass,  where  the  radius 
of  the  ring  is  one,  i.e., 

F(r)-U(r- 1).  (35) 


where  F()  is  the  magnitude  distribution  function  and 
(/(•)  is  the  unit  step  function.  The  rectangular  component 
marginal  density  is  the  sinusoidal  density 


/M- 


t/Q-x1) 
irV  1  —  x2  ' 


(36) 


This  density  function  is  Riemann  integrable,  hence  (11) 
and  (20)  are  valid.  This  implies  the  rectangular  format 
error  is  0(N  ~ ').  Now  consider  the  polar  format  case.  For 
Nr>  I,  £■*•  - 0.  This  implies  the  polar  format  error  for 
large  N  is  0(N  ~ J).  Clearly  polar  format  is  asymptotically 
better  for  this  density.  By  extending  this  argument,  we 
may  say  that  if  P(r-0)=?M,  then  for  any  bivariate  circu¬ 
larly  symmetric  density  with  an  atomic  magnitude  density 
with  a  finite  number  of  atoms,  polar  format  will  give  a 
smaller  asymptotic  mean  square  quantization  error  than 
rectangular  format. 


IV.  Summary 

In  this  paper  we  have  derived  a  simple  criterion  to 
determine  whether  rectangular  format  or  polar  format 
gives  smaller  mean  square  error  for  circularly  symmetric 
densities.  The  optimal  ratio  of  phase  quantizer  levels  to 
magnitude  quantizer  levels  is  also  derived.  Several  exam¬ 
ples  including  the  Gaussian  case  have  been  studied  in 
detail. 

It  is  interesting  to  note  that  polar  format  is  not  always 
better  than  rectangular  format  even  for  the  case  of  densi¬ 
ties  with  circular  symmetry. 


References 

[I]  P.  Zador,  “Development  and  evaluation  of  procedures  for  quantiz¬ 
ing  multivariate  distributions,'  Ph.D.  dissertation,  Stanford  Univer¬ 
sity,  Stanford,  CA,  1964. 

(2|  D.  Chen,  “On  two  or  more  dimensional  optimum  quantizers,  Record 
of  the  IEEE  Int.  Conf.  on  Acoustics,  Speech,  and  Signal  Processing 
Conference,  IEEE  press,  pp.  640-643.  1977. 

[3|  J.  Max,  “Quantizing  for  minimum  distortion,'  IEEE  Trans  Inform 
Theory,  vol.  IT-6,  pp.  7-12,  Jan.  I960. 

|4)  P.  F.  Panter  and  W.  Dite.  “Quantization  diatortion  in  pulse  count 
modulation  with  nonuniform  spacing  of  levels,”  Proc.  IRE.  vol.  39, 
pp.  44-48.  Jan.  J95I. 

|S)  N.  C.  Gallagher,  “Quantizing  schemes  for  the  discrete  Fourier 
transform  of  a  random  time  senes,"  IEEE  Trans.  Inform.  Theory, 
vol.  IT-24,  pp.  136-163,  Mar.  1978. 

|6)  J.  A.  Bucklew  and  N.  C.  Gallagher,  “A  Note  on  Optimum  Quanti¬ 
zation,”  IEEE  Trans.  Inform  Theory,  vol.  IT-25,  pp.  363-366,  May 
1979. 

(7)  G.  M.  Roe,  “Quantizing  for  minimum  distortion,”  IEEE  Trans. 
Inform  Theory,  vol.  IT- 10.  pp.  384-385.  Oct.  1964. 

(8)  R.  C.  Wood,  “On  optimum  quantization,”  IEEE  Trans.  Inform 
Theory,  vol.  IT-5.  pp.  248-252,  Mar.  1969. 

(9)  W.  A.  Pearlman,  “Quantization  Error  Bounds  for  Computer  Gener¬ 
ated  Holograms,”  Stanford  Vnw.  Inform.  Syst.  Lab.,  Stanford,  CA. 
Tech.  Rep.  #65031-1,  Aug.  1974. 


SOME  RESULTS  IN  MULTIDIMENSIONAL  QUANTIZATION  THEORY* 

James  A.  Bucklew 

Electrical  and  Computer  Engineering  Department 
University  of  Wisconsin,  Madison,  WI  53705 

and 

H.  C.  Gallagher,  Jr. 

Department  of  Electrical  Engineering 
Purdue  University,  West  Lafayette,  IN  47906 


Abstract 

This  paper  contains  several  results  in  multi¬ 
dimensional  quantization  theory.  The  first  section 
gives  a  simplified  derivation  of  a  well  known  upper 
bound  on  the  distortion  introduced  by  a  k-dimen- 
sional  optimum  quantizer.  It  is  then  shown  that  an 
optimum  multidimensional  quantizer  preserves  the 
mean  vector  of  the  input  and  that  the  mean  square 
quantization  error  is  given  by  the  sum  of  the  com¬ 
ponent  variances  of  the  input  minus  the  sum  of  the 
variances  of  the  output.  Lastly,  a  general  equa¬ 
tion  which  can  be  used  to  evaluate  the  performance 
of  multidimensional  companders  is  derived.  It  is 
shown  that  the  optimal  compander  must  be  conformal 
everywhere.  An  example  is  given  to  show  that  as¬ 
ymptotically  optimal  performance  could  be  obtained 
through  nonconformal  companding  schemes. 

I.  Introduction 

Block  or  vector  quantization  deals  with  the 
representation  of  -ultidimcnsional  elements  with 
a  finite  discrete  set  of  values.  The  values  to  be 
quantized  may  naturally  fall  into  a  k-dimcnsional 
representation;  typical  exanples  are  complex  num¬ 
bers,  positional  coordinates,  or  state  vectors. 

In  other  cases ,  k-dimensional  vectors  arc  formed 
from  blocks  of  k  samples  taken  from  one  dimension¬ 
al  signals.  In  1964  Raul  Zador  published  his  Ph.D. 
dissertation  which  contains  a  number  of  very  in¬ 
teresting  results  on  the  properties  of  optimal 
block  quantizers  for  the  r'th  moment  euclidean 
norm  distortion  measure  11] .  Among  Zador's  con¬ 
tributions  are  the  derivation  of  both  upper  and 
lower  bounds  on  the  distortion  introduced  by  the 
optimal  quantizer.  These  bounds  are  derived  with¬ 
out  actually  finding  the  optimal  quantizer.  Un¬ 
fortunately,  at  some  points  Zador's  development  is 
difficult  to  follow  and  alternate  derivations  and 
extensions  by  Gersho  [21,  and  Yamada,  at  al.  [31 
have  recently  appeared.  In  Section  II  we  present 
an  alternate  derivation  of  Zador's  random  quanti¬ 
zation  upper  bound  not  treated  rn  either  12)  or 
(31. 

In  (41  Bucklew  and  Gallagher  show  that  for 
one  dimensional  mean  squared  error  distortion  the 
optimum  quantizer  has  the  property  that  the  mean 
value  of  the  quantizer  output  equals  the  moan 
value  of  the  input  and  also  that  the  mean  square 
quantization  error  equals  tho  variance  of  the 
input  minus  the  variance  of  the  output.  In  (5) 
Bucklew  and  Gallagher  prove  that  the  same  results 
hold  for  constant  step  size  minimum  mean  squared 
error  quantizers.  In  Section  III  we  extend  these 
properties  to  k-dimensional  optimal  block  quan¬ 
tizers. 

W.  R.  Bennett  161  was  the  first  to  model  a 
nonuniform  quantizer  as  a  zero  memory  nonlinearity 
followed  by  a  uniform  quantizer  in  turn  followed 
by  the  inverse  of  the  first  zero  memory  nonlinear¬ 
ity.  This  sequence  of  operations  is  generally 
referred  to  as  conpanding.  The  word  arises 


because  the  data  is  first  "compressed'',  then  quan¬ 
tized,  then  "expanded".  As  a  consequence  the  fi'rst 
nonlinearity  is  generally  referred  to  as  the  “com¬ 
pressor"  and  its  inverse  the  "expander". 

The  fourth  Section  of  this  paper  is  an  inves¬ 
tigation  of  companding  in  several  dimensions.  In 
several  dimensions  the  compressor  characteristic 
is  a  mapping  function 

k  * 

f  i  R ♦  X  (0,1) 

1-1 

where  X  denotes  the  Cartesian  cross  product, 
k 

^X^(0,1)  is  of  course  the  k-dimensional  hypercube. 

In  the  companding  approach  to  optimal  quantization, 
we  have  quantizer  output  levels  distributed  it.  the 
hypercube.  We  (boose  from  these  output  levels  the 
nearest  neighbor  (usually)  to  f(>0,  where  x  is  the 
input  data  vector.  Our  quantized  output  is  then 
f~l  of  this  particular  output  level. 

Our  theory  will  hold  for  analog  signal  pro¬ 
cessing  in  several  dimensions  also.  It  happens 
that  it  doesn't  matter  whether  the  noise  is  quan¬ 
tization  noise  or  any  other  kind  of  additive  noise 
as  long  as  the  noise  components  in  each  channel  are 
uncorrellatcd  with  one  another.  For  example,  let 
us  denote  the  error  vector  caused  by  quantization 
in  the  hypercube  as  (ri,  r2,  ...,  r^)*.  Then  the 
condition  that  is  needed  is  ECr^r^}  -  where 

6m  is  the  Kronecker  delta  function.  In  a  practi¬ 
cal  sense,  this  is  not  a  very  restrictive  assump¬ 
tion.  It  may  be  shown,  at  least  asysptotically  (as 
the  number  of  output  levels  in  the  hypercube  ap¬ 
proaches  infinity) ,  that  the  error  vector  in  an 
optimal  or  random  quantizer  converges  to  a  hyper- 
spherical  ly  symmetric  probability  density  which 
satisfies  our  above  condition. 

II.  Random  Quantization  Upper  Bound 

In  (2)  Gersho  provides  a  very  readable  deriva¬ 
tion  of  Zador's  expression  for  quantizer  distor¬ 
tion.  To  improve  continuity  and  readability  we 
enploy  Gersho' s  notation;  the  quantizer  input  is  a 
k  dimensional  random  vector  in  Rfc  which  is  quan¬ 
tized  to  one  of  N  levels  ]£l»  Jt2*  •  ••  •  JCN  in  *k- 
The  space  R*  is  partitioned  into  N  disjoint  and 
exhaustive  regions  Sj,  S2 ,  ....  SN.  The  quantizer 
is  defined  by  the  function  Q(x),  where  for  k- 
dimensional  input  value 

C(x)  m  *tsi*  (1) 

Note  that  this  definition  does  not  require  e  Si , 
although  in  practioe  is  usually  contained  in 
The  performance  of  the  quantizer  is  measured  by 
the  distortion 

D  «  i  *{||x  -  Q(X)  II  r)  (2) 

where  ||  •  l|  denotes  the  usual  l2  norm,  the  operator 
e( • )  denotes  statistical  expectation  and  the  input 
X  is  a  k  dimensional  random  input  vector.  The  case 


fttytCfeTOti  rt&O 


(9) 


where  r=2  is  the  usual  mean  squared  distortion. 

The  expression  derived  by  Zador  and  Geruho  for  the 
minimum  distortion  Dq  obtained  by  use  of  the  best 
quantizer  is 

00  -  N  kC(k,r)!|p<x)||1[/0t+r),  (3) 

where 

||p(x*||a  -  l/[p(x))adx|1/a. 

and  where  the  constant  C(k,r),  called  the  coeffi¬ 
cient  of  quantization,  is  independent  of  the  den¬ 
sity  p(x)  and  is  in  general  unknown.  This  expres¬ 
sion  is  an  asymptotic  result  valid  only  for  large 
N.  Two  special  cases  for  which  the  value  of 
C(k,r)  is  known  exactly  are  (2) 

Cd.rl  .^2*r,  (4) 

and 

C  (2,2)  -  — —  .  (5) 

36/1 

Consider  the  density  p(x)  that  has  a  constant  value 
of  one  over  the  unit  volume  hypercube;  then 
||p(x)  llk/(k+r)  “  1*  Consequently,  Eq.  (3)  becomes 

DQ  -  Nk  C(k,r) .  (6) 

So,  we  see  that  by  finding  a  bound  on  Dq  we  also 
bound  C(k,r).  To  find  this  bound  we  choose  the 
quantizer  output  levels  to  have  a  random  distri¬ 
bution  uniformly  distributed  over  the  hypercube. 

For  a  particular  input  value  x,  wo  find  the  closest 
output  level  and  quantize  to  that  value.  Because 
this  quantizer  is  not  the  optimum  quantizer  the 
associated  distortion  will  bound  from  above  the 
distortion  for  the  optimum  quantizer. 

To  begin,  place  at  random  N  independent  uni¬ 
formly  distributed  k  dimensional  samples  in  the 
hypercube.  These  will  be  our  output  levels.  We 
take  the  quantizer  input  £  to  have  a  uniform  dis¬ 
tribution  over  the  hypcrcubc.  We  also  assume  that 
N  is  sufficiently  large  so  that  there  is  a  very 
small  probability  that  the  quantizer  input  is 
closer  to  an  edge  of  the  hypercube  than  to  one  of 
the  output  values.  Suppose  that  an  input  value  x 
has  arrived  and  is  sitting  in  the  hypercube  waiting 
to  be  quantized.  The  probability  that  one  partic¬ 
ular  output  value  is  within  a  distance  p  of  this 
input  s.impie  is  given  approximately  by  the  volume 
of  a  sphere  of  radius  p  about  that  sample  point,  or 

Prob  (one  particular  output  level  is  =  v  k  . 
within  p  of  the  input  sample)  ”  kp  '  ’ 

where  if  is  volume  of  the  unit  radius  sphere, 
then  is  the  volume  of  the  sphere  with  radius 

p.  We  are  interested  in  the  closest  output  level 
to  the  input  sample.  We  really  want  to  know  the 
probability  that  the  closest  output  level  is  with¬ 
in  a  distance  p  of  the  input  sample.  To  compute 
this  probability,  wo  combine  classical  order  sta¬ 
tistics  with  the  result  found  in  Eq.  (7).  By 
employing  this  approach,  we  compute  the  probability 
density  f(p)  for  the  distance  between  the  input 
sample  and  the  nearest  output  level  to  be 

f(p)  -  Nil  -  VkpklN-lVkkpk-1.  (8) 

By  construction  p  »  || 1] ,  where  x  is  the  input 
value  and  *•“  the  output  value.  Consequently, 


Moment  Properties  of 


In  (4)  and  (5)  it  is  shown  that  for  minimum 
mean  squared  error  one  dimensional  quantizers  that 
the  mean  of  the  input  equals  the  mean  of  output 
and  the  distortion  equals  the  variance  of  the  in¬ 
put  minus  the  output  variance.  It  is  shown  that 
these  properties  apply  with  and  without  the  equal- 
step-size  constraint.  In  this  section  we  general¬ 
ize  these  results  to  the  k  dimensional  case. 


We  are  Interested  in  the  properties  of  quan¬ 
tizers  designed  to  minimize  the  distortion  defined 
by  Eq.  (2)  for  r  •  2 i 

D  -  £  E{  ||  X.  -  Q  (X)  ||2)  .  (14) 


Many  constraints  we  impose  on  the  quantizer  can  be 
imposed  by  the  functional  form  of  Q(x)  ;  for  ex¬ 
ample,  the  k  dimensional  version  of  the  equal-step- 
sizo  condition  might  require  the  regions  S^,  S2, 
...,  Sn  to  have  equal  area  and  be  jointly  con¬ 
gruent.  A  variational  approach  Is  used  in  the 
derivation.  Assume  the  optimum  quantizer  is  Qo<x); 
so,  an  arbitrary  quantizer  characteristic  can  be 
represented  as 


0(x)  -  C0(x>  ♦  €  «0(x)  ,  (1S) 


where  c  is  an  arbitrary  real  variable  and  6p(x)  is 
an  arbitrary  variation  chosen  so  that  CMx)  satis¬ 
fies  all  constraints  imposed  on  the  quantizer.  We 
know  that  the  optimum  choice  for  t  is  e»0;  it  is 
this  value  that  minimizes  the  distortion  D.  Thus, 
3D/3C  ■  0  at  C  ■  0;  so. 


-  7  57  e{!!x  -  SIX)  I!2)  -  o  (16) 

“  3C  €a0 
or 

-  £  t{ (X  -  C0(X))6cT(X)}  -  0,  (17) 

where  we  note  that  Q(x)  is  a  vector  valued  function 
so  that  6Q(x)  represents  an  arbitrary  variation  and 
consider  the  case  where  each  component  of  this  var¬ 
iation  vector  equals  one i  consequently,  Eq.  (17) 
becomes 

E(X  -  0Q (X) }  -  0, 
or 


e{x)  »  e{Q0(X)}  .  (18) 

Now  consider  the  case  where  <Sq  (  jc)  «  CQ (x)  ;  then, 
E((X  -  <SQ  (X) ) (X) )  -  0, 
or 

e{xQq(X)1  -  E{  11  Q0  (X)  11  2}  .  (19) 

When  the  optimum  quantizer  is  in  use ,  the  distor¬ 
tion  D0  is 

°0  -  ^  Efl|x-  C0<x»ll2> 

-  i  E{ (X  -  Qq (X) 1IX  -  Qq (X) JT) 

“  J  IE{||X||2}  -  E(XJ)q(X>}  -  E{Qq(X)XT}  * 

♦  e(||C0(X)||2}|  .  (20) 

We  combine  the  results  of  Eq.  (19)  with  Eq.  (20) 
to  produce 

°o "  x  -  Enio0(x)ii2».  (2D 

The  results  of  Eq,  (21)  combine  with  those  in 
Eq.  (18)  to  provide  the  multidimensional  extension 
of  the  one  dimensional  case  found  in  (4)  and  [5]. 

We  note  here  that  this  derivation  is  quite  general. 
It  applied  to  the  unconstrained  optimal  quantizer 
as  well  as  the  equal  volume  congruent  area  (equal 
step  size)  quantizer  because  this  constraint  can 
be  included  directly  into  the  functional  form  of 
Q(x). 


pander  Error  Derivation 


Our  data  will  be  assumed  to  be  X-dimensional 
samples  from  a  probability  density  function  p(x), 
»c  **.  Denote  D  as  the  support  of  p(x).  Let 
k  p  “ 

f  i  D  ♦  X  (0,1)  such  that  f  is  regular  and  onto. 

P  i-1 


Wo  force  f  to  be  onto  because  if  it  wasn't, 
there  would  be  code  vectors  in  the  hypcrcubo  that 
would  never  be  used.  This  would  imply  that  the 
quantizer  would  have  to  be  suboptimal.  We  use  this 
condition  at  only  one  point  in  the  derivation  as  a 
constraint  on  the  optimal  compander.  All  equations 
derived  up  to  that  point  are  still  valid  without 
this  restriction.  We  will  sometimes  represent  this 
mapping  as 


f  -  (f^x),  f2<x>. 


fk(x»»T. 


Let  ir  •  (rj,  r2 ,  ...,  r^)7  be  the  error  vector  in 
the  hypercube.  As  stated  above,  under  some  fairly 


general  conditions,  E(r.r,}  »  - ; - ~i  where  4 

1  3  x  i ) 

is  the  Kroenckur  delta.  Assuming  very  small  dis¬ 
tortion,  u  good  approxinution  to  the  final  error 
vector  in  the  output  is  (fX)  •  (x)r.  Let  £  be  the 
variable  in  the  hypercube.  If  £  »  f (x)  then 

p„«‘l(z>> 

p  (y>  -  - r -  . 

y  |f' (f'1(y))| 

Therefore  the  final  output  mean  square  error  (mse) 
may  be  written 


E{r2)4; 


,  r  -i  T  _i  _i  _i 

'  /  r*  (£  )'  (f  A(£))(f  V(f  X(^))r 

k 

(0.1) 


|f'(f*l(y))| 


Let  x  »  f  1  (£)  then  dx  -  |  (f-1)  *  (y)  |  d£  and  note 
that  1  (f  1>'ty)|  •  j-f— ■ - j  by  the  inverse 

mapping  theorem  [7).  Therefore,  making  these 
changes  of  variable,  we  obtain 

mse  -  /  rT(f,(xl)'1,(f,(«)|*1rp  (x)dx, 


again  by  the  inverse  mapping  theorem.  Denote 
If*  (x)  l"1Tlf  (x)  )~*  •  E_X(x)  and  note  this  is  a 
symmetric  matrix  for  every~x.  Therefore  our  prob¬ 
lem  is  to  optimize 

/  L  I"1  <*>3* 


Using  a  matrix  identity  the  above  integral  becomes, 
/  tr{£  (x)rr[)p  (x)dx  . 

°p 

Let  us  now  take  the  expectation  over  the  r  variable 
which  is  independent  of  any  other  quantity  in  the 
integral  (one  can  make  a  random  coding  argument  to 
do  this) , 


E{rrT)  -E /  r2rX 


The  re  fore 


/  tr(J  (x)}p  (x)dx 


This  expression  is  of  Interest  in  its  own  right. 
E{r2)A  is  the  mean  square  error  per  saiq>le  suf¬ 
fered  by  the  hypercube  quantization.  So  the  total 
error  is  a  product  of  two  terms  operating  indepen¬ 
dently  of  one  another.  Denote  the  eigenvalues  of 
E(x)  as  X*(x)  (i  ■  1,  ...,  k).  Then 

E{  r2)  j5  f  Px<*>  . 

mse  -  —r —  \  -x - dx  . 

*  i-1  >  X‘( 


Since  our  map  f  is  onto  this  implies 

k 

/  |  f'  (x)  |  -  /  7  i  (x)dx  -  1. 


Let  us  minimize  the  mse  subject  to  the  above  con¬ 
straint.  It  is  easy  to  show  first  of  all  that 
X^tx)  •  X(x)  for  every  1.  So  now  minimize 


1 


b 


a 


! 


p(x) 


—  dx  subject  to  constraint 


Op  X  (x) 

/  X(x)kdx  -  l. 

V  "  ~ 

Let  B(x>  »  X  (x)  ,  so  idnimize 

,  p(x)  , 

/  - ^ .  dx  where  /  B(x)dx  =  1. 

0(x)2/k 

Gersho  (2)  shows  that  the  optimal  8(x)  is  propor- 
1 

1+2  A  kA*2. 


ti  ,..al  to  p  (x) 

1 

X(x)  -  p(x)k*2  /(i|p| 


kA*2 


(x) .  This  implies 
1~ 
k*2 


eigenvalues,  the  mse  * 


E(r2) 


Using  these 


|p|lkA+2‘  If  “ 


optimal  k-dimensional  uniform  quantizer  is  imple¬ 
mented  in  the  hypercube  this  equation  gives  the 
same  error  as  Zadcr's  optimum  quantizer.  Our  con- 
di’  ion  for  the  optimal  compressor,  is  all  of  the 
eigenvalues  of  the  symmetric  matrix 


l  <x)  -  [fWlIf  (x)]T 
ere 

♦T<x,I  (X)4i(x)  -  X2(x)» 


are  the  same;  this  implies  there  exists  an  ortho- 
normal  matrix  $(x)  such  that 


I  <*) 


X2  (x)  I  -  (fMxIllf  (x)lT 


(f  (x) ) 

which  implies  — —  is  on  orthonormal  matrix. 

Since  we  know  what  X  (x)  is,  in  principal  we  could 
solve  for  f*  < x_)  for  evory  value  of  Therefore 
our  condition  for  an  optimal  compander  is  that 
1 


(f(x)l/cp(x) 


k+2 


every  value  of  £  where  c  ■  l/(||p 


'kA+2 


When  k-2  this  condition  says  that  fix)  must 
be  conformal  almost  everywhere.  Excluding  sets  of 
measure  zero  is  an  infiortunt  point.  Gersho  points 
out  (for  the  2  dimensional  case)  that  conformal 
nups  do  not  exist  for  circularly  symmetric  proba¬ 
bility  densities.  One  consequence  of  this  is  the 
work  by  Heppes  and  Szuz  (81  which  shows  that  you 
can't  tcsselate  a  circular  region  with  an  arbi¬ 
trary  "surface  distribution  function"  using  regular 
hexagons.  There  must  always  be  a  "slit"  where  the 
tessellation  fails.  This  "slit"  however  is  a  sot 
of  measure  zero.  It  is  only  local  conformality 
almost  everywhere  that  wo  need,  not  global  confor¬ 
mality. 

We  will  now  do  an  example  Illustrating  the 
use  of  Eq.  (221.  Suppose  our  Input  probability 

k 

density  p(x)  can  be  written  as  T7P(*>.  Let 

oo  i-1  1 

C  «  l//  p(x)  dx  and  our  compressor  function 

■“  T 

f  «  (f1(*l,  f2<x2)  ,  ...,  f^lx^))  where  rixj  “ 

J  1  p(x)°dx.  With  little  loss  of  generality,  we 

-1X1 

will  assume  f  is  regular.  It  is  obviously  onto. 
Hence 


(fix))- 


;CP (*j) 

0  Cp(x2)C 


\ 


0  Cpfx^) 


(f  (x)  ) 


The  eigenvalues  of  £  (x)  are  - 

Cp(xi)2Q 

k.  So  the  error  may  be  written 
k 


rf  2,  k  TT pui] 

Sir  }  j  j  i-1  J 


i-1  D  c2P (x  )2° 
P  1 


E(r2)  * 


- 2 —  I  Plxl  dx 

C 

-  E(r2}  jy  p(x)adx|  |y  plx)1”2^)^ 

Using  Holders  inequality  we  may  show  that  a  •  1/1 
minimizes  the  error  or 


be  an  orthogonal  matrix  for  almost 
1 

,k+2 


E{r2)||p|| 


1/3  * 


But  looking  at  Zador's  coefficient  for  the  one 
dimension  case  (sea  Eq.  4)  we  have 

lip)’ 

mse. 


'1/3 


1-Dim 


12  N 


Therefore  this  compressor  characteristic  gives  us 
the  same  error  as  the  optimal  1  dimensional  quan¬ 
tizer  if  in  the  hypercube  we  quantize  with  one 
dimensional  uniform  quantizers.  We  can  quantize  in 
the  hypercube  using  optimal  schemes  for  a  coeffi¬ 
cient  of  (as  K  ♦  *•) 

H’lll/3 


N22we 


Therefore  the  best  we  may  do  with  this  coiqercssor 

2fTe  ~ 

characteristic  is  a  gain  of  -jy  -  1.42  in  signal  to 

quantizing  noise  ratio,  at  the  expense  of  imple¬ 
menting  optimal  uniform  quantizers  in  the  hypercube. 

k 

As  a  second  example  suppose  again  p  (x)  -  |  J  p  ( x  )  . 

1  =  1  1 

Suppose  we  choose  the  eigenvalues  of  J (x)  to  be 


J 


2 


A2(xl- 


3-1 

JjLL 


Pt*^ 


1 

k+1 


/  P(x) 


k-1 

k*l 


dx 


This  obviously  loads  to  a  non con  formal  map.  We  may 
using  Eq.  [22]  now  evaluate  the  error  for  such  a 
compressor  characteristic  to  obtain  the  mean  square 
error  to  be 

mse  -  E(r2>  lip|lk.l  “  r2} |] pj|  (k_1) 

k+2  (k-1) +2 

which  are  the  optimal  coefficients  for  k-1  dimen¬ 
sional  space.  This  implies  the  possibility  of  ob¬ 
taining  nonconformal  mapping  functions  that  will 
asymptotically  give  optimal  results. 


References 


[1]  p.  Zador,  Development  and  Evaluation  of  Pro¬ 
cedures  for  Quantizing  Multivariate  Distribu¬ 
tions  ,  Ph.D.  Dissertation,  Stanford  University, 
University  Microfilm  no.  64-98SS. 

[21  A.  Gersho,  -Asymptotically  Optimal  Block 
Quantization*,  IEEE  Trans.  Inform.  Theory, 

Vol.  IT-25 ,  pp.  373-380,  July  1979. 

(31  V.  Yamada,  S.  Taaaki,  and  R.  M.  Gray, 

-Asymptotic  Performance  of  Block  Quantizers 
with  Difference  Distortion  Measures*,  to 
appear  in  IEEE  Trans.  Inform.  Theory. 

[41  J.  A.  Bucklew  and  N.  C.  Gallaher,  *A  Note 

on  Optimal  Quantization",  IEEE  Trans.  Inform. 
Theory,  Vol.  IT-25,  pp.  365-366,  May  1979. 

(51  J.  A.  Bucklew  and  N.  C.  Gallagher,  "Optimum 
Uniform  Quantizers",  to  appear  in  Inform. 
Theory,  Sept.  1980. 

161  W.  R.  Bennett,  "Spectra  of  Quantized  Signals", 
8.S.T.J.,  Vol.  27,  pp.  446-472,  July  1948. 

(7)  W.  Fleming,  Functions  of  Several  Variables. 
Springer-Verlag,  1977. 

(81  A.  Heppes  and  P.  Szusz,  "Bemerkung  zu  einer 
Arbeit  von  L.  Fejes  Toth",  El.  Math.,  15, 

1960,  pp.  134-136. 


This  work  was  partially  supported  by  Grant  AF0SR- 
783605. 


SOME  RECENT  DEVELOPMENTS  IN  QUANTIZATION  THEORY* 


(Invited  Paper) 

by 

Neal  C.  Gallagher,  Jr. 
School  of  Electrical  Engineering 
Purdue  University 
Nest  Lafayette,  IN  47907 

and 

James  A.  Bucklew 
Department  of  Electrical  and 
Computer  Engineering 
University  of  Wisconsin 
Madison,  Wisconsin  5J705 


Abstract 

A  critical  review  of  many  important  developments 
in  quantization  theory  is  presented  beginning  with 
Bennett's  1948  paper  Cl  3.  The  purpose  of  this 
study  is  to  resolve  some  seemingly  conflicting 
results.  We  then  turn  to  a  discussion  block  or 
vector  quantizers.  we  shou  that  minimum  mean 
squared  error  block  quantizers  preserve  the  input 
mean  in  the  output  variable  and  that  the  error 
equals  the  variance  of  the  input  minus  the  variance 
of  the  output.  We  also  Illustrate  a  way  by  which 
the  compander  method  of  quantizer  implementation 
may  be  extended  to  block  quantizers. 

I*  Introduction 

The  quantization  problem  has  been  around  for 
ages.  The  fact  is  that  almost  all  real  numbers 
must  be  quantized  if  they  are  to  be  represented  by 
use  of  a  finite  number  of  digits.  If  we  are  to 
choose  a  real  number  at  random,  the  probability  is 
one  that  the  number  would  need  to  be  quantized  for 
representation  with  a  finite  number  of  digits. 
Early  modern  work  on  quantization  includes  the  work 
of  Bennett  C13,  and  Panter  and  Dite  CZ).  Bennett 
is  the  first  to  present  an  analysis  of  companding 
systems.  A  typical  companding  system  is  shown  in 
Fig.  1,  where  the  sy  tern  input  is  x  and  output  is 
y. 


"  rRESSOR  UNIFORM  QUANTIZER  EXPANDER 


Figure  i  Typical  Companding  System 


*This  wor  «oS  supported  in  part  by  the  Air  Force 
Office  Scientific  Research  under  Grant  AFOSR 
78-3605. 


The  input  is  first  compressed  by  the  nonlinearity 

G  V)  whose  output  is  uniformly  quantized  over  the 
interval  E0,1 > .  It  it  this  quantized  value  that 
may  be  transmitted  over  a  communication  link  or 
stored  in  digital  memory.  Whan  we  require  a  true 
representation  of  this  quantized  value,  this  uni¬ 
formly  quantized  value  is  expanded  by  the  non¬ 
linearity  G(* ) .  Bennett  presents  an  expression  for 
the  mean  square  quantization  error  for  a  companding 
system  in  the  asymptotic  (large  N)  case.  This  work 
for  further  extended  by  Panter  and  Dite  who  studied 
the  design  of  optimum  non-uniform  step  size  quan¬ 
tizers.  They  derived  asymptotic  expressions  (large 
N)  for  finding  the  minimum  mean  squared  error  quan¬ 
tizer  design. 

In  his  studies,  Bennett  made  a  number  of  empiri¬ 
cal  observations  concerning  the  statistical  proper¬ 
ties  of  quantized  signals.  These  observations 
where  given  a  theoretical  foundation  by  Widrou  C3J . 
widrow  showed  that  the  instantaneous  quantization 
error,  which  is  a  signal  dependent  error,  can  be 
treated  as  statistically  independent  from  the  sig¬ 
nal  and  uniformly  distributed  over  the  quantizer 
step  size  (for  equal  step  size  quantizers)  when  the 
number  of  quantization  levels  is  sufficiently 
large.  In  another  often  referenced  paper  Smith  (4) 
further  extended  Panter  and  Dite’s  results  and  com¬ 
pares  theoretical  and  experimental  studies.  These 
four  papers  by  Bennett,  Panter  and  Dite,  Widrow, 
and  Smith  form  the  basis  for  subsequent  work  on 
quantization. 

3y  1957  there  was  still  no  exact  solution  for 
the  optimum  quantizer;  however,  during  this  time  at 
Bell  Labs,  Lloyd  C53  completed  an  unpublished 
technical  memorandum  in  which  he  provides  a  method 
of  solution  for  the  optimum  quantizer.  It  is  un¬ 
fortunate  that  Lloyds  work  was  never  published  be¬ 
cause  it  is  Max's  1960  paper  C63  that  receives  most 
of  the  acknowledgement  for  solving  the  optimum 
quantizer  design  problem.  Max's  paper  is  probably 
the  most  widely  referenced  paper  on  quantization. 
In  their  respective  papers  Lloyd  and  Max  develop 
necessary  conditions  for  the  optimum  quantizer; 
however,  these  conditions  are  not  sufficient  and 
they  can  be  satisfied  for  non-optimum  quantizers. 
In  1964  Fleischer  C7J  presented  conditions  under 


To  Oc  /".eaenti'.d  af  Oie  12-th  Annual  Sifnjjonun  on  Sijslem  Thcoiuj,  Mai/  19-2 0,  1910,  J  -uigirua  Beach,  /  iriginia. 


which  Max's  results  are  also  sufficient. 
Fleischer's  conditions  establish  that  Max's  results 
are  both  necessary  and  sufficient  for  the  optimum 
quantitation  of  many  random  variables  which  have 
common  distributions  such  as  Gaussian  or  Rayleigh. 

Me  move  to  1977  when  Sripad  and  Snyder  C83  con¬ 
sidered  the  correlation  between  the  input  signal 
and  quantitation  noise.  Their  work  actually 
represents  a  re-evaluation  of  Widrow's  results. 
They  developed  necessary  and  sufficient  conditions 
for  the  quantitation  error  to  be  uniform  and  un¬ 
correlated  with  the  input.  These  conditions  are 
very  restrictive  and  are  not  satisfied  by  most  com¬ 
mon  densities  of  interest.  Although  Sripad  and 
Snyder  do  not  actually  do  so,  the  flavor  of  their 
paper  is  to  contradict  the  observations  of  Bennett 
and  Widrow  that  the  quantization  noise  behaves  like 
uniformly  distributed,  uncorrelated  (independent) 
additive  noise.  The  difference  between  these  ap¬ 
parently  conflicting  claims  is  that  Sripad  and 
Snyder  are  saying  that  quantization  noise  is  usual¬ 
ly  not  exactly  uniformly  distributed  and  uncorre¬ 
lated  with  the  signal,  while  Widrow  is  saying  that 
although  these  properties  are  not  exact  they  often 
are  almost  valid.  Experimental  evidence  seems  to 
verify  Widrow's  conclusions  as  being  valid  in  most 
situations  while  Sripad  and  Snyder  are  cautioning 
us  to  be  careful  in  .-iplying  Widrow's  conclusions. 

The  work  of  Widrow,  and  Sripad  and  Snyder  ap¬ 
plies  only  to  uniform  step  size  quantizers  with  an 
infinite  number  of  output  levels;  of  course,  reat 
quantizers  have  only  a  finite  number  of  output  lev¬ 
els.  Thus,  for  a  real  quantizer  the  error  analysis 
may  be  divided  into  two  parts:  one  part  occurs  when 
the  input  signal  falls  within  the  quantizer's 
range,  called  non-truncatim  error,  and  the  other 
is  called  truncation  error  ,nd  occurs  uhen  the  in¬ 
put  signal  falls  beyond  the  quantizers  range.  The 
analysis  of  Widrow,  and  Sripad  and  Snyder  implicit¬ 
ly  assumes  that  the  contribution  of  the  truncation 
error  can  be  made  arbitrarily  small  by  choosing  the 
quantizer  range  to  be  arbitrarily  large.  For  a 
quantizer  with  a  finite  number  of  output  levels 
this  is  not  possible  because  the  quantizer  error 
will  increase  in  an  unbounded  manner  as  the  quan¬ 
tizer  range  increases.  If  we  turn  attention  to  op¬ 
timum  uniform  step  size  quantizers,  where  the  quan¬ 
tizer  step  size  is  chosen  so  as  to  minimize  the  to¬ 
tal  error,  we  can  study  the  optimum  relationship 
between  the  truncation  and  non-truncation  errors. 
We  may  then  study  the  1  miting  behavior  of  the  er¬ 
ror  as  the  number  o  output  levels  becomes  large 
and  determine  the  rel  ve  effect  of  the  truncation 
error.  In  section  .1  we  will  study  the  effect  of 
truncation  error  and  illustrate  through  an  example 
the  tact  that  truncation  can  not  be  ignored. 

Optimum  quantizers,  both  uniform  step-size  and 
non-uniform  step-size,  possess  a  number  of  in¬ 
teresting  properties  not  proven  until  the  1979  pa¬ 
per  of  Bucklew  .id  Gallagher  C9J.  Here  it  is  shown 
that  for  •’  non-uniform  step-size  minimum  mean 
no'  err oi  quantizer  the  output  mean  value  is 
i qu-> 1  to  th«  oput  mean  value.  It  is  also  shown 
that  th’  quantizer’s  error  is  equal  to  the  input 
variance  min'  the  Output  variance.  In  an  unpub- 
i'shed  m.iou  .  ->  ‘lucklew  and  Gallagher  prove  that 
;he  mimmu-  '  n  squared  error  uniform  step  size 
quantizer  -assesses  these  some  two  properties.  By 
using  the.e  properties  it  can  also  be  shown  that 
corr-'l.jtion  between  the  quantizer  error  and  input 


signal  is  equal  to  minus  the  mean  squared  error. 
Consequently,  for  minimum  mean  squared  error  quan¬ 
tizers,  the  signal  and  noise  are  negatively  corre¬ 
lated,  but  this  correlation  is  near  zero  for  qua  - 
tizers  with  small  error.  In  section  III,  we 
present  a  novel  derivation  of  the  aforementioned 
properties;  this  derivation  is  general  and  is  valid 
for  both  the  optimum  non-uniform  step-size  and  uni¬ 
form  step  size  quantizer. 

To  this  point  we  have  only  discussed  the  quanti¬ 
zation  of  scalar  quantities.  Often  the  data  to  be 
quantized  naturally  falls  into  a  k-dimensional 
representation;  typical  examples  are  complex  num¬ 
bers,  positional  coordinates,  or  state  vectors.  In 
other  cases,  k-dimensional  vectors  are  formed  from 
blocks  of  k  samples  taken  from  one  dimensional  sig¬ 
nals.  The  topic  of  block  or  vector  quantization 
deals  with  the  representation  of  multidimensional 
elements  with  a  finite  discrete  set  of  values.  In 
1964  Zador  published  his  Ph.D.  dissertation  which 
contains  a  number  of  very  interesting  results  on 
the  properties  of  optimal  block  or  vector  quantiz¬ 
ers  for  .he  r'th  moment  euclidean  norm  distortion 
measure  CIO].  Among  Zador's  contributions  are  the 
derivation  of  both  upper  and  lower  bounds  on  the 
distortion  introduced  by  the  optimal  quantizer. 
Unfortunately,  at  some  points  Zador's  development 
is  difficult  to  follow  and  alternate  derivations 
and  extensions  by  Gersho  C11]  in  1979,  and  Yamada 
et .  al.  C12]  in  I960  have  recently  appeared.  In 
section  IV  we  present  an  alternate  derivation  of 
Zador's  upper  bound.  Unfortunately,  this  work  on 
vector  quantizers  provides  very  few  clues  on  how  to 
actually  find  the  best  quantizer  and  this  remains 
an  unsolved  problem  at  present. 

Some  of  the  early  work  on  the  Implementation 
vector  quantizers  actually  occurred  in  the  study  of 
computer-generated  holograms;  see  the  work  of 
Pearlman  [13]  and  Gallagher  [14]  for  references. 
The  questions  treated  in  this  work  concerns  the 
representation  of  two-dimensional  vectors  in  quan¬ 
tized  polar  format  and  quantized  rectangular  for¬ 
mat.  The  reasoning  behind  this  work  is  to  investi¬ 
gate  the  relative  merits  of  those  two-dimensional 
quantizers  that  we  know  how  to  implement  whereas  we 
don't  know  the  optimum  implementation.  In  their 
1978  paper  Pearlman  and  Gray  [15]  employ  an  infor¬ 
mation  theoretic  approach  to  study  the  quantization 
of  two-dimensional  Gaussian  vectors  where  the 
vector's  X  and  Y  components  are  independent,  zero 
mean,  and  identically  distributed.  In  particular 
they  compare  polar  quantization  against  rectangular 
quantization.  They  show  that,  when  the  vector  is 
in  polar  form,  the  phase  component  carries  signifi¬ 
cantly  more  information  than  the  magnitude  com¬ 
ponent.  As  a  result,  the  phase  component  should  be 
quantized  very  finely  in  comparison  to  he  magnitude 
component.  Pearlman  and  Gray  show  that  for  a  fix-’d 
number  of  output  levels  NpNR  *  constant,  the  op¬ 
timum  ratio  between  the  number  of  phase  levels  hp 
and  the  number  of  magnitude  levels  Np  is  approxi¬ 
mately  Ng/NR  -  2.6.  In  1979  using  a  non- 

information  theoretic  approach  Bucklew  and  Gal¬ 
lagher  [16]  rederive  this  same  ratio  and  then  gen¬ 
eralize  the  analysis  to  circularly  symmetric  dis¬ 
tributions  [17],  It  is  found  that  in  most,  but  no- 
all,  cases  polar  format  quantization  is  better  than 
rectangular  format. 


(S) 


The  problem  of  the  design  and  implementation  of 
optimua  vector  quantizers  reaains  open.  Sections 
IV,  V,  and  VI  of  this  paper  will  discuss  some  re¬ 
cent  work  toward  the  solution.  In  section  IV  we 
show  that  the  optimua  vector  quantizer  shares  some 
common  properties  with  the  optimum  scalar  quantiz¬ 
er;  in  particular  the  mean  value  of  the  quantizer 
output  equals  the  mean  value  of  the  input,  and  the 
mean  squared  error  equals  the  input  variance  minus 
the  output  variance.  Section  V  contains  a  simpli¬ 
fied  derivation  of  Zador's  upper  bound  on  the  quan¬ 
tizer  error,  and  section  VI  discusses  the  possibil¬ 
ity  of  extending  the  companding  concept  to  multi¬ 
dimensional  quant i zat ion. 


II .  Truncation  Errors  in  Optimum  Uniform 
Step-Size  Quantizers 


Much  of  the  work  dealing  with  the  properties  of 
uniform  step  size  quantizers  assumes  a  nonzero 
step-size  a  with  an  infinite  N«»  number  of  output 
levels.  In  other  words  the  quantizer  has  infinite 
range  and  never  reaches  a  saturation  point  which  is 
the  largest  (or  smallest)  value  to  which  an  input 
may  be  quantized.  If  an  input  value  falls  between 
the  largest  and  smallest  saturation  points,  we  say 
that  it  is  within  the  quantizer's  non-truncation 
region.  if  an  inpi t  value  falls  beyond  a  satura¬ 
tion  point,  we  say  that  the  input  falls  within  the 
truncation  region  and  call  this  type  of  error  trun¬ 
cation  error.  Practical  quantizers  have  truncation 
errors  and  it  is  the  tradeoff  between  the  trunca¬ 
tion  and  non-truncation  errors  that  is  optimized 
when  we  design  a  minimum  error  quantizer. 

A  common  approximation  for  the  mean  squared  er¬ 
ror  of  a  quantizer  with  stc  •>  size  4  is  a2/12.  This 
approximation  is  derived  under  the  assumption  that 
the  truncation  error  is  negligible.  In  many  non- 
pathological  situations  the  contribution  of  the 
truncation  error  can  not  be  Ignored;  this  can  be 
most  simply  illustrated  through  an  example.  Con¬ 
sider  the  probability  density  function 


t(x) 


Im/2 


<1> 


Let  the  non-truncation  region  be  <-T,T),  where  the 
value  of  T  is  approximately  given  by  T*NA/2.  If 
the  quantizer  input  falls  in  the  region  CT,«),  the 
output  value  is  T»a/  .  If  the  input  falis  in  the 
region  (~»,T3,  the  out  t  is  quantized  to  -T-A/2. 
If  the  input  falls  wi  ->  the  non-truncation  region 
(-T,T),  then  the  mean  , ..oared  is  accurately  approx¬ 
imated  as  A?/12.  Because  the  density  in  (1)  is 
even,  we  can  write  the  following  approximate  ex¬ 
pression  for  the  mejn  squared  error  D 


f(x) 


1  *6/2 


1*1 


therefore,  for  large  T 

D  •  2/Cx  -  T  -  |j2  1+4/2 


1*1 


l*s  dx  *  T? 


_  ,  t“6  *  IT  T~<2*6)  *  * 

*1 1  *  *  *jT  * 


where 


-6  A 

V 

(2*6) 


i+« 


and 


if)2 


Equation  (4)  may  be  rewritten  as 


(4) 


.  ,  ,  N  a  .  —a  a  r 
B  ’  Ki  *  T7  *  (S) 

This  is  an  approximate  expression  valid  for  large 
N.  To  find  the  optimum  value  for  a,  we  take  the 
derivative  of  b  with  respect  to  a  and  set  the 
resulting  expression  equal  to  zero.  This  yields 


(Nara 


; 


consequently,  the  minimum  0  is  given  by 


»  "  T2  (1  +  T> 


(7) 


Depending  on  the  value  of  4,  the  value  of  D  can  be 

significantly  larger  than  the  common  a2/T2  approxi¬ 
mation.  Loosely  speaking,  the  validity  of  this  ap¬ 
proximation  seems  to  depend  upon  the  existence  of 
higher  order  moments.  If  all  moments  exist,  then 
the  approximation  appears  to  be  asymptotically 
correct.  If  only  a  few  moments  exist,  it  does  not 
seem  to  be  a  good  approximation. 


Ill,  first  and  Second  Moment  Properties  of 
Optimum  Quantizers 


0  =  2?  I  -- 1 -f]2f  (x)dx  ♦  fj^Pdxl  <  T>  .  (2) 

I 

in  the' limit  as  the  number  of  output  levels  N  goes 
to  infinity,  the  value  of  T  also  approaches  infini¬ 
ty  and  the  v-'  •*  of  A  goes  to  zero.  Consequently, 
for  large  i.,  -;econd  tern  in  (2)  is  accurately 

represent--  ./  a2/12.  In  the  first  term  of  (2), 
for  lar-_  .alues  of  T,  the  density  in  (1)  may  be 
approximated  as 


At  this  point  a  general  definition  of  a  quantiz¬ 
er  is  required.  First,  the  input  signal  space  is 
partitioned  into  N  disjoint  and  exhaustive  regions 
S1,S2x..</SN.  The  quantizer  function  is  defined  by 

the  function  Q(x),  where  for  input  value  x 

Q(x>  *  y^,  if  i  i  S(  .  (8) 

Note  that  this  definition  does  not  require  y^  c  S^, 
although  in  practice  y^  is  usually  contained  in  S . . 


The  performance  if  the  quantizer  is  measured  by  the 
mean  squared  dis  ortion 

0  *  E<CX-Q<  )J2>,  (9) 

for  random  input  x.  Assume  that  the  optimum  quan¬ 
tizer  character  stic  is  denoted  by  Qq(x).  At  this 

point,  we  may  or  may  not  add  the  restriction  the 
Qg<«)  represent  an  uniform  step-size  quantizer. 

This  restriction  may  be  represented  in  the  func¬ 
tional  form  of  Q(x)  and  aq(x).  Consider  the  quan¬ 
tizer  function  Q(x)  •  Qq(x)  ♦  c6Q(x),  where  50(x) 

represents  an  arbitrary  variation  and  c  is  a  real 
valued  constant.  It  should  be  noted  that  the  term 
c40(x)  must  be  such  that  9<x)  is  a  legitimate  quan¬ 
tizer  characteristic.  If  the  uniform  step  size 
restriction  is  in  place,  then  8<*>  must  satisfy 
this  restriction  clearly  c  =  0  is  the  optimum 
choice  for  this  parameter;  thus. 


or 

E(CX-Q0<x)JiQ<X»  *  0  .  (11) 

As  proven  to  this  point,  the  condition  in  (11)  is 
only  a  necessary  condition  for  the  optimum  quantiz¬ 
er.  In  order  to  prove  that  this  condition  is  also 
sufficient  we  consider  the  error  D  for  an  arbitrary 
quantizer  Q(«)  i  Og(x)  ♦  c6Q(x). 

0  *  E([X-Q(X)]2> 

=  etcx-qq(x)]2>  -  2cE{Cx-qQtx)]«q(x)> 

♦  .2E«48(X)J2>  .  (12) 

The  first  term  in  this  expression  is  the  error  for 
the  optimum  quantizer  Qq(x).  The  second  term  is 

zero  by  (11),  and  the  third  term  must  be  non¬ 
negative.  Consequently  (11)  is  both  a  necessary 
ind  sufficient  condition  for  an  optimum. 

We  can  use  (11)  to  show  that  for  the  optimum 
quantizer  the  mean  of  tie  output  equals  the  mean  of 
tne  input.  To  do  this  we  choose  the  arbitrary 
variation  4Q(x)  •  1;  th  "efore  by  (11) 

ECX-OgtX))  -  0.  (1J> 

If  we  choose  9g(«)  x,  then 

E(XQq(X>>  =  t(COQ(X)32),  (14) 

and  consequent 1  , 

1  :  x-8q(X)l  ’)  a  E(x2)  -  EUQgCX))2),  (15) 

and  finally, 

EfCO.t'i  '  -  E(tX-a.  (X))2>.  (16) 

•J  V 

Equation  '  states  that  the  mean  squared  error 
tor  the  optimum  quantizer  equals  the  input  variance 
minus  the  output  variance.  Equation  (16)  indicates 


that  the  correlation  between  the  quantization  error 
and  the  quantizer  input  is  equal  to  the  negative  of 
the  mean  squared  error. 

IV.  first  and  Second  Moment  Properties  of  the 
Opt i mum  Vector  quantizer 

As  the  second  moment  properties  for  the  vector 
quantizer  are  similar  to  those  properties  discussed 
in  the  previous  section  for  the  scalar  quantizer, 
we  will  only  sketch  their  derivation.  We  consider 
the  k-dimensional  case  where  the  distortion  D  is 
measured  as 

0  m  i  E<||X-q(X>||2),  (17) 

where  X  and  8  (JO  art  vector  valued,  and  ||‘|l 
denotes  the  usual  Euclidean  distance  norm.  Again  a 
variational  approach  is  employed,  where  an  arbi¬ 
trary  quantizer  function  Q(x>  is  written  in  terms 
of  the  optimal  quantizer  as  ~ 

Q(_x)  *  Oq(jO  ♦  c  <Q(x), 

for  a  vector-valued  variation  <g(x).  As  before,  we 
take  the  derivative  of  D  with  respect  to  t,  and  set 
the  result  equal  to  zero  at  c*0;  the  result  is 

E(CX-Q0(X)3T4q(X»  *  0,  (18) 

where  C  )T  denotes  the  transpose  of  the  column  vec¬ 
tor.  This  expression  is  both  necessary  and  suffi¬ 
cient  for  the  optimum  quantizer  q  (x).  The  optimum 

o  — 

vector  quantizer  also  has  the  following  properties 
analogous  to  those  scalar  quantizer  properties 
found  in  (U>,  (15),  and  (16): 

E(Cx-Q0(X))Tq0(X»  *  0  (19) 

^€C||X-a0(X)||>  *  ^-CECI  |X|  |  2>  -  E< I  |9g(X)  1 12)],(20) 

and 

{ C 8g (X) -JO T£>  »  -  £  EE||Q0CX)-X||2>.  (21) 

V.  Zador1 s  Random  Juan* * zation  Bound 

The  quantizer  input  is  a  k  dimensional  random 
vector  in  R^  which  is  quantized  to  one  of  N  levels 

Lyt-l* •  •  •  ’n  The  sPac*  **1,  is  partitioned 
into  N  disjoint  and  exhaustive  regions 
S?,S^,...,SN.  The  quantizer  is  defined  by  the 

function  8(0,  where  for  k-dimensional  input  value 
i» 

J(«)  *  L\r  if  JC  C  Sj  . 

The  performance  of  the  quantizer  is  measured  by  the 
distortion 

D  «  I  EE||X  -  8(X)||rJ 

The  case  where  r*2  is  the  usual  mean  squared  dis¬ 
tortion.  The  expression  derived  by  Zador  CIO)  and 

Gersho  C 1 1 D  for  the  minimum  distortion  0  obtained 

o 

by  use  of  the  best  quantizer  is 


Bq*  N  “  C<l.#r)||p(j,)||k/(|(>r),  (22) 

•here  p(£)  is  the  probability  density  for  the  input 
vector  X,  xud 

llp(«)lla  *  cjcpun'do1'0  . 

The  constant  C(k,r),  called  the  coefficient  of 
quantizat ion,  is  independent  of  the  density  p(£> 
and  is  in  general  unknown.  This  expression  is  an 
asynptotic  result  valid  only  for  large  N.  Two  spe¬ 
cial  cases  for  which  the  value  of  C(k,r)  is  known 
exactly  are  til] 

C<1,r>  .  2"r, 


and 


between  the  input  sample  and  the  nearest  output 
level  to  be 

f(p>  =  NCI  -  VkPk3"'1Vkitek"1.  (25) 

Note  that  for  large  values  of  N  this  probability 
density  goes  to  zero  rapidly  as  s>  increases.  By 
construction  o  =  llx-^H,  where  £  is  the  input 

value  and  jr.  is  the  output  value.  Consequently, 

E<||X  -  Q(X)||r>  *  E<pr>;  (26) 


so. 


D  *  i  ECor> 


hyper cube 


pr^k‘1NC1-V|tPk)N'1k  Vk  dp 


C  (2,2)  =  —5—  . 
56^3 


If  we  make  the  change  of  variables  s  =  v^p*,  then 
we  use  the  fact  that  s  <  1  to  write 


Consider  the  density  p(x)  that  has  a  constant  value 
of  one  over  the  unit  volume  hypercube;  then, 
l|p(»)|lk/(k+r)  5  ’•  Consequently,  £q.  (22)  be¬ 
comes 


C<k,r). 


(23) 


b  < 


N  r(l<—)f (n) 

r  /  |r  * 

k  v”  r(N+1*£) 


(27) 


So,  we  see  that  by  finding  a  bound  on  Og  we  also 

bound  C(k,r).  To  find  this  bound  we  choose  the 
quantizer  output  levels  to  have  a  random  distribu¬ 
tion  uniformly  distributed  over  the  hypercube.  For 
a  particular  input  value  £  we  find  the  closest 
output  'level  and  quantile  to  that  value.  Because 
this  quantizer  is  not  the  optimum,  the  associated 
distortion  will  bound  from  above  the  distortion  for 
the  optimum  quantizer. 

To  beqin,  place  at  random  N  independent  uniform¬ 
ly  distributed  k  dimensional  samples  in  the  hyper¬ 
cube.  These  will  be  our  output  levels.  We  take 
the  quantizer  input  £  to  have  a  uniform  distribu¬ 
tion  over  the  hypercube.  We  also  assume  that  N  is 
sufficiently  large  so  that  there  is  a  very  small 
probability  that  the  quantizer  input  is  closer  to 
an  edy-  of  the  hypercube  than  to  one  of  the  output 
values.  Suppose  that  an  input  value  £  has  arrived 
and  is  sitting  in  the  'ypercube  waiting  to  be  quan¬ 
tized.  The  probabili'  that  one  particular  output 
value  is  within  a  >'  an ce  o  of  this  input  sample 
is  given  approximate  ,  oy  the  volume  of  a  sphere  of 
radius  ibou>  th..t  sample  point,  or 

Prub  (one  particular  output  level  is  ^ 

within  o  of  the  input  sample)  ’  Vk'' 
where  if  V,.  i'  zolume  of  the  unit  radius  sphere, 

•h  r  is  the  volume  of  the  sphere  with  radius 

o.  We  ar .  interested  in  the  closest  level  to  the 
input  sample.  We  want  to  know  the  probability  that 
the  closest  >ut  level  is  within  a  distance  o  Of 
the  input  sa  pi  •  To  compute  this  probability,  we 
combine  .,ical  order  statistics  with  the  result 
found  ir  .;  ).  By  employing  this  approach,  we  com¬ 
pute  the  probability  density  f ( o >  for  the  distance 


where  r(*)  is  the  gamma  function.  For  large  N  the 
following  approximation  is  valid: 

k*r 

tin)  .  „*  nr 


If  f  p 

r(N*~-> 


Therefore, 


D  * 


N-r/kT(H£> 


kV, 


r/k 


(28) 


Because  t  >  Dg,  we  use  (22)  to  write 

r(1m£) 

C(k,r)  £  ..  , 

kV, 


(29) 


which  is  lador's  random  quantization  upper  bound. 


VI .  Companding  in  Several  Dimensions 

For  one  dimensional  quantizers  companding  pro¬ 
vides  a  method  whereby  asymptotically  optimum  quan¬ 
tizers  may  be  implemented  in  a  straightforward 
fashion.  In  several  dimensions  the  compressor 
characteristic  is  a  mapping  function 
k 

f:  R  •  X  (0,1),  where  X  denotes  the  Cartesian 
1,1  k 

cross  product.  The  set  X  (0,1)  is  of  course  the 
i»0 

k-dimensional  hyper cube.  In  the  companding  ap¬ 
proach  to  optimal  quantization,  we  have  quantizer 
output  levels  distributed  in  the  hypercube.  We 
choose  from  these  output  levels  the  nearest  neigh¬ 
bor  to  fix),  where  £  is  the  input  data  vector.  Our 


r 


i 

i 

1 

< 

t, 

x 

i 

« 

t 


< 


quantised  output  is  then  f”1  of  this  particular 
output  level.  Denote  the  error  vector  caused  by 

quantitation  in  the  hypercube  as  'r2»*  •  •''V  T 


and  iapose  the  condition  that  ECr.r.)  *  o^S.., 

i  )  r  i)' 

where  4 . ^  it  the  Kronecker  delta.  It  may  be  shown 

that  as  the  number  of  output  levels  N  in  the  hyper¬ 
cube  approaches  infinity,  that  the  error  vector  for 
an  optimal  quantizer  converges  to  a  hyperspherical- 
ly  symmetric  probability  density  which  satisfies 
the  above  condition.  In  addition,  for  large  N 
there  are  an  infinite  number  of  quantizers  each  of 
which  has  approximately  the  same  near  optimum  er¬ 
ror.  These  quantizers  may  be  generated  as  transla¬ 
tions  of  one  another  within  the  hypercube.  *  sim¬ 
ple  way  to  visualize  this  fact  is  by  use  of  the  one 
dimensional  companding  system  where  the  compressor 
function  output  is  a  uniformly  distributed  (0,1) 
random  variable.  We  can  form  translations  of  the 
uniform  quantizer  and  still  obtain  approximately 
the  same  mean  squared  error  in  the  expander  output. 
So,  we  may  consider  an  ensemble  of  near  optimum 
quantizers  over  the  hypercube,  where  each  quantizer 
approaches  the  optimum  quantizer  in  the  asymptotic 
(large  N)  case.  By  allowing  us  to  choose  (in  an 
arbitrary  fashion)  f.om  this  ensemble  of  quantizers 


for  each  input  vector 


( X ^ , X^, • • . , X ^ )  , 


we  can 


decouple  the  error  vector  (r^r^,...,^)*  from  the 

input  so  as  to  make  this  error  vector  approximately 
independent  of  the  input  vector.  This  procedure  is 
analogous  to  the  technique  of  assigning  a  random 
time  origin  to  sampling  nperations  in  order  to 
model  the  sampi»d  signals  as  wide  sense  stationary 
processes . 

Our  data  will  be  assumed  to  be  k-dimensional 
samples  from  a  probability  density  function  p(x), 

x  c  R..  Denote  S  as  the  support  of  p(x).  Let 
-  k  p  — 

k 

f:  S  •  X  (0,1)  such  tht  f  is  regular  and  onto. 

p  i 


We  can  represent  this  mapping  as 


f  =  (f1(x),f2(£),...,fk(x))T. 

Let  r_  -  (r^,r 2,...,rk)  be  the  error  vector  in  the 
hypercube.  Assuming  ery  small  distortion  a  good 
approximation  to  the  nnal  error  vector  in  the  out- 
-1  >  • 

put  is  (f  )  (x)ir,  where  (f  )  (O  represents  the 
matrix  of  partial  derivatives  of  the  inverse  opera¬ 
tor  f  .  Let  y  be  the  variable  in  the  hypercube. 
If  y  =  f(x),  then  the  probability  density  for  jr  may 
be  written  in  •  rms  of  the  probability  density  for 
x  as 


ey(l> 


P,<,"1(£)) 

-'1 <y))  I 


Tnerofore  the  final  output  mean  square  error  D  may 
be  written 


r  t  .1  i  P„<* 

D  =  J  r'(f  ’)  (f  ’(y))(f  ’>  (f  1  ( y > ) - : - . 

k~  ~|f  (f_1(y))| 

X 

i*1 

-1  .1  I 

Let  ji  =  f  (jr),  then  dx  ■  |(f  )  Cjr )  | djr  and  note 

-1  *  i  ^ 

that  |  ( f  )  ( y )  |  *  — i - 1 -  by  the  inverse  map- 

|f  (f_1(y))| 

ping  theorem.  Making  these  changes  we  can  write 
D  *  |  rTCf,(x)31TCf,(OD“1rp  U)dx  . 


Denote  Cf  (03  Cf  (03  ■  £  (O  and  note  this 

is  a  symmetric  matrix  for  every  j<.  Therefore  our 
problem  is  to  minimize 

D  *  /  jrT  ^  (x)rp  (x)di^. 

Sp-  --X-- 


As  discussed  earlier,  there  is  an  ensemble  of  near 
optimum  quantizers.  If  we  now  average  the  distor¬ 
tion  D  over  this  ensemble,  we  assume  that  the  error 
vector  r_  is  sufficiently  decoupled  from  the  input 
vector  so  as  to  be  treated  as  an  independent  random 
quantity.  Consequently,  we  have 

D  *  /  trt  i  (x)rrT>p  (x)dx. 

SP  X" 

«  o\  f  trt  j;1  (x)>p  (x)dx. 

SP 

So,  the  total  error  is  a  product  of  two  terms 
operating  independently  of  one  another.  Denote  the 

eigenvalues  of  £(0  as  A?(jt)  (i  =  1,...,k).  Then 

D  =  *  Z  J  P.txJ/'-txJdx.  (50) 

r  i*1  “  ~ 

Consider,  for  the  moment,  a  random  vector  with  a 

k 

uniform  distribution  over  the  hypercube  X  (0,1); 

i  =  1 

the  f  ^ (• )  function  maps  this  vector  to  a  vector  in 
with  support  $p  and  density  |f  (x)|.  Therefore 
we  have 


$  if'(«)i  =  j  n  (*)d« 

sp  “ 


i 


(31) 


The  problem  now  is  to  minimize  the  expression  in 
(30)  subject  to  the  constraint  in  (31).  We  may  do 
this  in  the  following  fashion:  (1)  Assume  that  ex¬ 
cept  for  *.(0  *H  of  the  Aj(j<)  are  the  optimum 

choice.  (2)  Use  a  variational  method  to  optimize 
x^(x)  subject  to  the  constraint  (31).  The  result 

is  that  *.(x)  *  x(x)  for  all  i  and  that  the  optimum 
x (x)  is 


A (x)  *  p(x) 


k*2 


/<"‘>l|k(ke2)> 


1 


(3.’) 


t 


Using  these  eigenvalues,  we  find  that  the  minimum 
error  Dq  is  given  by 

D0  *  °r  ^  k(k+2> " 

If  an  optimal  k-dimensional  uniform  quantizer  is 
implemented  in  the  hypercube  (this  determines  the 

value  of  o^)  this  expression  gives  the  same  error 

as  Zador's  optimum  quantizer  [10].  Additional 
results  on  the  properties  and  implementation  of 
multi-dimensions  companding  systems  are  presented 
in  an  as  yet  unpublished  paper  by  Bucklew  CIS]. 


References 

1.  W.  R.  Bennett,  "Spectra  of  Quantized  Signals," 
Bell  Syst.  Tech.  J.,  Vol.  27,  pp.  *46-472, 
July  1948. 

2.  P.  f.  Panter  and  W.  Bite,  "Quantization  in 

Pulse-Count  modulation  with  Nonuniform  Spacing 
of  Levels,"  Proc.  IRE,  Vol .  59,  pp.  44-48, 

1951 . 

3.  8.  Uidrow,  "A  Study  of  Rough  Amplitude  Quanti¬ 
zation  by  Means  of  Nyquist  Sampling  Theory," 
IRE  Trans.  Circuit  Theory,  Vol.  CT-S,  pp. 
266-276,  Dec.  1956. 

4.  B.  Smith,  “Instantaneous  Companding  of  Quan¬ 
tized  Signals,"  Bell  Syst.  Tech.  J.,  Vol.  47, 
pp.  653-709,  Hay  1957. 

5.  S.  P.  Lloyd,  "Least  Squares  Quantization  in 
PCM,"  unpublished  memorandom.  Bell  Labora¬ 
tories,  1957. 

4.  J.  Max,  "Quantization  for  Minimum  Distortion," 
IRE  Trans.  Inform.  Theory,  Vol.  IT-6,  pp. 
7-12,  Mar.  1960. 

7.  P.  E.  Fleisher,  “Sufficient  Conditions  for 
Achieving  Minimum  Distortion  in  a  Quantizer," 
IEEE  Int.  Conv.  Rec.,  Vol.  I,  1964,  pp. 
104-11. 


8.  A.  B.  Sripad  and  D.  L.  Snyder,  "A  Necessary 
and  Sufficient  Condition  tor  Quantization  Er¬ 
rors  to  be  Uniform  and  White,"  IEEE  Trans. 
Acoust.  Speech,  Signal  Processing,  Vol. 
ASSP-2S,  pp.  442-448,  Oct.  1977. 

9.  J.  A.  Bucklew  and  N.  C.  Gallagher,  "A  Note  on 
Optimal  Quantization,"  IEEE  Trans.  Inform. 
Theory,  Vol.  IT-25,  pp.  365-566,  May  1979. 

10.  P.  Zador,  Development  and  Evaluation  of 
Procedures  ^or  Quantizing  Multivariate 
Distribution,  Ph.D.  Dissertation^  Stanford 
University,  1964,  University  Microfilms,  Inc., 
Ann  Arbor,  Michigan,  *64-9855. 

11.  A.  Gersho,  "Asymptotically  Optimal  Block 
Quantization,"  IEEE  Trans.  Inform.  Theory, 
Vol.  IT-25,  pp.  373-380,  July  1979. 

12.  T.  Yamada,  S.  Tazaki,  and  R.  N.  Gray,  "Asymp¬ 
totic  Performance  of  Block  Quantizers  with 
Difference  Distortion  Measures,"  To  appear  in 
IEEE  Trans.  Inform.  Theory. 

13.  W.  A.  Pearlman,  Quantization  Error  Bounds  for 
Computer  Generated  Holograms.  Stanford  Univ. 
InformT'Syst .Lab.,  Stanford,  CA,  Tech.  Rep. 
*6503-1,  Aug.  1974. 

14.  N.  C.  Gallagher,  "Optimum  Quantization  in  Di¬ 
gital  Holography,"  Appl.  Opt.,  vol.  17,  pp. 
109-115,  Jan.  1,  1978. 

15.  W.  A.  Pearlman  and  R.  M.  Gray,  "Source  Coding 
of  the  Discrete  Fourier  Transform,"  IEEE 
Trans.  Inform.  Theory,  Vol.  IT-24,  pp. 
683-692,  Nov.  1978. 

16.  J.  A.  Bucklew  and  N.  C.  Gallagher,  "Quantiza¬ 
tion  Schemes  for  Bivariate  Gaussian  Random 
Variables,"  IEEE  Trans.  Inform.  Theory,  Vol. 
IT-25,  pp.  537-543,  Sept.  1979. 

17.  J.  A.  Bucklew  and  N.  C.  Gallagher,  "Two- 
Dimensional  Quantization  of  Bivariate  Circu¬ 
larly  Symmetric  Densities,"  IEEE  Trans.  In¬ 
form.  Theory,  Vol.  IT-25,  pp.  667-671,  Nov. 
1979. 

18.  J.  A.  Bucklew,  "Companding  and  Random  Quanti¬ 
zation  in  Several  Dimensions,"  Submitted  for 
publication. 


I 


< 


PASSBAND  AND  STOPBAND  PROPERTIES 
OF  MEDIAN  FILTERS* 


N.  C.  Gallagher,  Jr. 

School  of  Electrical  Engineering 
Purdue  University 
U.  Lafayette,  IN  47907 


G.  L.  Wise 

Department  of  Electrical  Engineering 
University  of  Texas 
Austin,  Texas  78712 


1 


i 


\ 


I 


t 


i 


ABSTRACT 

Median  filtering  is  a  signal  smoothing  tech¬ 
nique  that  has  been  applied  successfully  in  speech 
and  picture  processing.  However  the  method 
suffers  from  the  want  of  a  formal  theory  by  which 
filter  properties  may  be  studied.  In  this  paper 
Necessary  and  sufficient  conditions  for  a  signal 
to  be  invariant  under  median  filtering  are 
derived.  These  conditions  state  that  a  signal 
must  be  locally  monotone  to  pass  through  a  median 
filter  unchanged.  It  is  proven  that  successive 
median  filtering  of  a  signal  (i.e.  the  filtered 
output  is  itself  filtered)  reduces  the  original 
signal  to  an  invariant  signal  called  a  root  sig¬ 
nal.  For  a  signal  of  length  L  samples  a  maximum 

of  ^(L-2)  repeated  filterings  produces  an  root 
signal . 


The  results  Illustrated  in  Fig.  1  suggest  the 
concept  of  a  filter  “pastband"  and  “stopband". 
The  given  signal  is  in  the  pastband  of  the  N-1 
filter  and  the  stopband  of  the  N*4  filter.  If  we 
view  the  median  filter  as  one  that  passes  edges 
but  not  impulses,  then  edges  for  an  N-1  filter  may 
be  impulses  for  an  N*4  filter.  But  what  about  the 
N»2  and  N*3  filters?  Suppose  the  signal  of  Fig.  1 
is  filtered  twice  in  succession  by  the  N*2  filter; 
in  other  words,  the  filtered  output  is  again  fil¬ 
tered.  The  result  is  a  constant  output  identical 
to  that  obtained  by  a  single  filtering  with  an  N=4 
filter.  If  the  constant  is  filtered  again,  the 
output  is  the  same  as  the  filter  input;  the  con¬ 
stant  is  invariant  to  median  filtering.  So,  by 
filtering  the  original  signal  two  times  with  an 
N*2  or  N*3  filter  we  have  a  resulting  signal  that 
is  Invariant  to  successive  filterings,  the  same 
result  obtained  by  a  single  pass  with  the  N=4 


I.  Introduction 


In  many  signal  processing  applications  a  method 
called  median  filtering  has  achieved  some  very  in¬ 
teresting  results.  One  useful  characteristic  of 
median  filtering  is  its  ability  to  preserve  signal 
edges  while  also  filtering  out  impulses.  Promis¬ 
ing  applications  of  median  filtering  are  picture 
processing,  and  speech  processing  Cl -33 .  These 
applications  employ  the  median  filter  as  a  signal 
smoother.  The  implementation  of  a  median  filter 
requires  a  very  simple  digital  nonlinear  opera¬ 
tion.  To  begin,  we  take  a  sampled  and  quantized 
signal  of  length  L;  across  this  signal  we  slide  a 
window  that  spans  2N*1  signal  sample  points.  The 
filter  output  is  set  equal  to  the  median  value  of 
these  2N+1  signal  samples.  The  filter  output  is 
associated  with  the  time  sample  at  the  center  of 
the  window.  To  account  for  start  up  and  end  ef¬ 
fects  at  the  two  endpoints  of  the  L-length  signal, 
N  samples  are  appended  to  the  beginning  and  the 
end  of  the  sequence.  The  appended  samples  are 
constant  and  equal  in  value  to  the  first  and  last 
samples  of  the  original  sequence,  respectively. 
As  an  example,  consider  the  binary  valued  sequence 
of  Fig.  1(a),  where  1*10  and  N*1;  the  median  fil¬ 
tered  signal  is  plotted  below  the  input  signal. 
The  appended  values  are  marked  as  X's.  Figure 
1(b)  illustrates  the  filtering  of  the  same  input 
signal  as  for  Fig.  1(a)  but  we  set  N*2;  we  set  N=3 
for  the  example  in  Fig.  lie).  The  signal  of  Fig. 
1  passes  undisturbed  through  the  N*1  filter;  how¬ 
ever  it  is  affected  by  the  N*2  and  N=3  filters. 
The  signal  would  be  reduced  to  a  constant  value  by 
an  N*4  filter. 


X  •  •  •  •  •  •  K 

» - 1 - 1 - i - i - 1 _ I _ I _ I _ I _ I _ J 

I  WIN1XAJ  .  fnwn  left  |o  risjM  ,  . 

i « .  i  | - - - H  I 


I - 1 - 1 - 1 _ I _ L 


•  • 


J - 1 - 1 


l  II  li  i 


nr  - 


I  It  II  Ml  I)  IN.JI 


X  X 


» - 1 - 1 - 1 - 1 - 1  I  I  J— 1 _ A 


X  X 
j — i 


•  •  •  •  l  •  #  g  Hill#  Oulfiii 

l—l - 1  1  ,1  I - 1  A  i — I 

it.) 


Mill  8  I'll-  I 


XXX* 
t.  .1  I  I 

I  NINUUw 


1  .1  t 


XXX 
J _ i _ 1 


(HUM  null’ll 


I — I — I — I — I _ I _ I _ I _ l 

U  > 


•The  research  was  supported  by  the  Air  Force  Of-  Fig. 
fice  of  Scientific  Research  under  grants  AFOSR 
78-3605  and  AFOSR  76-3062. 


Signal  Filtered  by  Three  Different 
Median  Filters  (a)  N  *  1,  (b)  N  *  2, 
and  (c)  N  ■  3. 


f  N.  /  f  > 


'  « r  *■  /■ 


filter.  Note  that  the  input  signal  of  Fig.  1  is 
invariant  to  repeated  filtering  with  an  N=1 
filter.  Me  see  that  signals  which  do  not  reside 
entire!/  within  the  filter  “passband"  can  be  re¬ 
duced  to  their  passband  coaponent  by  repeated 
filterings. 

At  present,  there  has  been  no  proposed  aedian 
filter  design  procedure.  There  is  no  aethod  by 
which  the  filter  window  size  can  be  designed  to 
account  for  soae  special  properties  of  the  signal 
or  noise;  the  only  way  of  doing  this  is  by  trial 
and  error.  In  this  paper  we  initiate  the  develop- 
aent  of  a  foraal  theory  for  aedian  filters.  Ue 
will  foraalize  the  concepts  of  filter  passband  and 
stopband.  Ue  described  desirable  signal  charac¬ 
teristics  for  signals  eaployed  in  aedian  filtering 
and  show  how  soae  types  of  noise  can  be  coapletely 
removed  by  aedian  filtering  and  how  other  types 
can  not  be  reaoved.  These  results  will  be 
presented  through  the  development  of  a  foraal 
theory  of  aedian  filtering.  In  section  II  we 
present  soae  basic  definitions  that  allow  us  to 
precisely  state  and  prove  a  number  of  interesting 
results. 

II.  Theory  for  Hedian  f i Itering 

In  order  to  give  a  precise  statement  for  the 
theorems  presented  later  in  the  section  a  number 
of  definitions  are  necessary.  Ue  will  always  be 
working  with  a  sample  length  L  where  each  saaple 
is  quantized  to  one  of  K  different  values.  The 
filter  window  length  is  the  nuaber  of  consecutive 
samples  considered  when  coaputing  the  running 
median.  Ue  will  always  take  the  window  length  to 
be  an  odd  integer  (2N+1)  for  N«Q,1,2,...  As  noted 
earlier,  our  convention  is  that  the  filter  output 
at  position  l  is  the  aedian  value  obtained  when 
position  l  is  in  the  center  of  the  window.  Ue  de¬ 
fine  the  following  signal  characteristics: 

1.  A  constant  neighborhood  is  at  least  N+1  con- 
secutive  identically  valued  points 

2.  An  edge  is  a  constant  neighborhood  whose  last 
point  is  i he  first  point  of  a  monotonic  change 
whose  lost  point  is  the  first  point  of  another 
constant  neighborhood  having  a  different  con¬ 
stant  value  from  the  first  constant  neighbor¬ 
hood. 

3.  An  impulse  is  a  constant  neighborhood  followed 
by  at  least  one  but  no  more  than  N  points 
which  are  then  followed  by  another  constant 
neighborhood  having  the  same  value  as  the 
first  constant  neighborhood.  The  two  boundary 
points  of  these  at  most  N  points  do  not  have 
the  same  value  as  the  two  constant  neighbor¬ 
hoods. 

4.  An  oscillation  is  a  sequence  of  points  which 
is  not  part  of  a  constant  neighborhood  an  edge 
or  an  impulse. 

Of  particular  interest  is  the  class  of  signals 
that  can  pass  through  the  filter  unchanged  as  well 
as  the  class  of  signals  that  are  completely  re¬ 
moved  by  filtering.  Assume  that  an  l-length  sig¬ 
nal  is  filtered  with  a  2N+1  window.  As  noted  pre¬ 
viously,  we  always  append  to  the  beginning  of  the 
signal  an  additional  N  constants  equal  in  value  !o 
the  first  sample  of  the  signal.  Similarly,  N  con¬ 
stant  points  .ire  appended  to  the  end  of  the  l- 

*lt  has  recent  1/  come  to  our  attention  that  S. 
Tyan  has  proven  a  version  of  this  theorem  in  an 
unpublished  manuscript.  Ue  have  not  seen  a  copy 
of  this  manuscript  at  can  only  speculate  as  to  its 
contents. 


length  signal.  By  doing  this,  we  assure  that  when 
the  initial  signal's  first  or  last  sample  is  in 
the  center  of  the  window,  the  aedian  filter  output 
equals  this  sample  value.  For  a  signal  to  pass 
through  a  median  filter  unchanged  aeans  that  the 
central  sample  value  for  each  window  position  is 
itself  the  median  of  the  samples  within  the  win¬ 
dow. 

Consider  a  signal  unchanged  by  median  filter¬ 
ing.  Assume  that  the  window  increaents  froa  sam¬ 
ple  to  saaple  moving  froa  left  to  right  across  the 
signal  and  that  the  window  is  now  centered  at  the 
second  signal  saaple  of  the  original  signal.  we 
know  that  the  N  points  to  left  of  center  have  the 
same  constant  value.  If  they  equal  the  value  of 
the  center  point,  then  it  (the  center  point)  must 
be  the  aedian.  If  they  are  less  than  the  value  of 
the  center  point,  then  the  N  points  to  the  right 
of  center  must  be  all  greater  than  or  equal  to  the 
central  value.  If  the  N  points  to  left  are 
greater  in  value  than  the  central  point,  then  the 
N  points  to  the  right  are  all  less  than  or  equal 
to  the  center  value.  Thus  note  that  the  leftmost 
N+2  points  in  the  window  fora  a  monotone  sequence 
of  points.  Increment  the  windown  another  saaple 
to  the  right,  so  that  the  window  is  now  centered 
at  the  third  signal  saaple.  The  leftaost  N+1  sam¬ 
ples  in  the  window  fora  a  aonotone  sequence.  As- 
suae  that  the  N  leftaost  points  in  the  window  are 
not  greater  than  (respectively,  not  less  than)  the 
center  point.  Then,  since  the  center  point  is  the 
aedian  value  of  the  points  in  the  window,  the  N 
rightmost  points  in  the  window  aust  be  not  less 
than  (respectively,  not  greater  than)  the  center 
point.  Thus  we  see  once  again  that  the  leftmost 
N+2  points  in  the  window  fora  a  aonotone  sequence. 
Increment  the  window  another  saaple  to  the  right. 
By  applying  the  saae  argument  as  before,  we  again 
find  that  the  N+2  leftaost  points  in  the  window 
fora  a  aonotone  sequence.  Indeed,  a  straightfor¬ 
ward  inductive  argument  proves  that  the  leftmost 
N+2  points  in  the  window  fora  a  aonotone  sequence 
regardless  of  the  window  position.  Recalling  that 
the  appended  signal  has  N  constant  points  appended 
to  he  right  of  the  original  signal,  we  see  that 
the  appended  signal  is  such  that  any  consecutive 
N+2  points  aust  be  aonotone.  Thus  a  signal  in¬ 
variant  to  median  filtering  aust  be  such  that  the 
appended  signal  contains  only  constant  neighbor¬ 
hoods  and  edges. 

Now  assume  that  the  appended  signal  contains 
only  constant  neighborhoods  and  edges.  If  the 
center  of  the  window  is  at  any  signal  sample,  then 
the  points  in  the  window  are  either  monotone  or 
non-monotone.  If  the  points  are  monotone,  then 
the  signal  sample  at  the  center  of  the  window  is 
not  changed  by  the  median  filter.  If  they  are 
non-monotone,  then  the  window  must  be  centered  on 
a  point  in  the  constant  neighborhood  shared  by  two 
edges.  Of  the  2N+1  points  in  the  window,  at  least 
N+1  of  them  are  equal  to  the  center  point,  and 
thus  the  center  point  is  unchanged  by  median 
fi  Itering. 

These  observations  are  formalized  in  the  fol¬ 
lowing  theorem. 

Theorem  1.  Given  a  length  L,  1C  valued,  sequence 
to  be  meJian  filtered  with  a  2N+1  window,  a  neces¬ 
sary  and  sufficient  condition  for  the  signal  to  be 
invariant  under  median  filtering  is  that  the  ap¬ 
pended  signal  consist  only  of  constant  neighbor¬ 
hoods  and  edges*. 


The  following  corollary  is  a  direct  result  of  this 
theore®. 


Corollary,  fm  a  median  filter  invariant  signal 
to  contain  both  regions  of  increase  and  decrease, 
the  points  of  increase  and  decrease  must  be 
separated  by  a  constant  neighborhood  (at  least  N+1 
consecutive  identical  points). 


As  a  result  of  this  theore®  it  is  possible  to 
construct  signals  that  are  invariant  to  median 
filtering.  Also,  given  the  space  of  all  length-L, 
K-valued  signals  S  it  is  possible  to  identify  all 
those  signals  invariant  to  median  filtering  with  a 
2N+1  window.  We  will  call  these  signals  the  roots 
of  the  filter,  and  this  set  of  signals  is  denoted 
as  ft  .  Note  that  Ru  S  for  any  N  and  that  we 

N  N 

have  the  following  lemma. 


Lemma  1_:  For  an  L-length  K-valued  set  of  signals 
S,  the  root  sets  are  nested  such  that  ... 

RN+1  rN  "*  R0  "  S* 


Proof .  If  a  signal  is  invariant  to  a  filter  of 
window  length  2(N+1)  ♦  1,  then  each  neighborhood 
of  N*S  samples  is  monotone.  Consequently  each 
neighborhood  of  length  N*2  is  monotone  and  the 
signal  is  invariant  to  a  filter  window  of  length 
2N*1;  i.e.  ft  ,  ft  .  It  is  trivial  to  verify 

N*  I  fi 

that  a  window  of  length  1  reproduces  any  signal 
exactly  upon  filtering  because  the  median  value  of 
a  set  containing  just  one  point  is  the  value  of 
that  point;  thus,  Rg  =  S. 


We  have  estaolished  that  for  a  given  filter 
window  2N+1  and  a  signal  set  S,  there  exist  a  root 
set  R  of  signals  invariant  to  filtering.  For  a 

N 

given  L-length  signal  s  we  represent  the  median 
filtered  version  of  s  by  f^CsJ  for  a  2N*1  site 

window.  We  represent  by  f^2>(s)  the  twice  fil¬ 
tered  signal: 

f‘2)(s>  *  fNCfN(s)]. 

We  define  f*n,(s)  as  the  n-times  filtered  signal: 
f‘n)(s)  =  fNCf,Jn'1)(s)]. 


leftmost  N*2  points  must  be  monotone  as  seen  in 
the  proof  of  Theorem  1.  Assume  without  loss  of 
generality  that  they  are  monotone  nondecreasing. 
Assume  that  the  window  is  now  centered  at  the 
point  pel.  By  hypothesis,  this  point  must  change 
in  value.  Recall  that  the  leftmost  N  points  are 
not  greater  in  value  than  the  center  point.  If 
the  N  rightmost  points  were  greater  than  or  equal 
to  the  center  value,  then  this  value  at  pel  would 
be  the  median.  Thus,  at  least  one  point  to  the 
right  of  center  must  have  a  value  less  than  that 
of  pel.  Thus  there  are  Nel  points  in  the  window 
not  greater  in  value  than  the  center  point,  and 
the  center  point  changes.  Therefore  it  changes 
downward  in  value.  Note  that  it  can  never  achieve 
a  value  less  than  the  value  of  the  immediately 
preceeding  constant  neighborhood  because  there  are 
always  at  least  N+1  points  contained  in  the  window 
including  p+1  itself  whose  values  are  all  greater 
than  or  equal  to  the  constant  neighborhood. 

So  we  see  that  the  first  point  that  changes 
under  filtering  is  preceded  by  but  not  necessarily 
adjacent  to  an  invariant  constant  neighborhood, 
and  the  point  is  contained  either  in  an  impulse  or 
oscillation.  We  also  see  that  upon  filtering,  the 
value  of  this  point  moves  closer  to  the  value  of 
the  constant  neighborhood.  There  are  two  possi¬ 
bilities:  the  value  of  point  p  equals  the  value 
of  pel,  or  the  value  of  point  pel  is  greater  than 
that  of  p.  In  addition,  it  can  be  shown  that  the 
value  of  point  pel  is  greater  than  the  value  of 
point  p.  Suppose  that  the  two  points  have  the 
same  value.  As  the  window  increments  from  posi¬ 
tion  p  to  pel  one  point  moves  out  of  the  window  on 
left  side  and  another  point  moves  into  the  window 
on  the  right.  The  point  that  moves  out  on  the 
left  has  a  value  less  than  or  equal  to  that  of 
point  pel.  Because  we  know  that  the  filtered 
value  of  pel  is  less  than  the  original  value,  the 
point  that  moves  in  on  the  right  side  must  also 
have  a  value  less  than  that  of  pel,  otherwise  the 
value  of  pel  cannot  decrease.  If  the  value  of 
point  p*1  is  the  same  as  that  of  p  then  there 
remain  N  points  in  the  window  less  than  or  equal 
to  the  value  at  pel  (and  at  p)  and  also  N  points 
in  the  window  greater  than  or  equal  to  the  value 
at  pel;  consequently,  point  pel  is  the  median  and 
would  not  change.  Thus,  the  value  of  the  first 
point  to  change  must  be  greater  than  its  predeces¬ 
sor. 


If  s  -  fN(s),  then  s  is  a  root  of  the  filter.  We 
next  prove  that  for  any  signal  s  there  exists  an  n 
such  that  fin)(s)  *  r,  where  r  is  a  root. 

Pi 

Suppose  we  are  given  an  L-length  signal  $  that 
is  not  a  root.  Recall  that  N  constant  points  are 
appended  to  the  beginning  of  the  signal.  By  con¬ 
struction,  the  first  original  signal  point  is  the 
median  of  the  interval  for  which  it  is  the  central 
point.  As  we  slide  the  window  from  left  to  right 
across  the  signal,  the  first  point  to  move  (i.e. 
where  the  window's  central  point  is  not  the  medi¬ 
an)  must,  by  definition,  be  either  a  point  con¬ 
tained  in  an  impulse  or  oscillation.  Suppose  it 
is  an  impulse.  By  construction  an  impulse  has  two 
constant  neighborhoods  of  equal  value  on  either 
side,  and  every  point  in  the  impulse  is  filtered 
to  this  constant  value  by  one  pass  of  the  filter 
window.  Suppose  the  first  point  to  be  moved  is 
contained  in  an  oscillation.  Let  p  be  the  last 
point  unaffected  by  the  median  filter,  and  assume 
the  filter  is  centered  at  this  point.  Then  the 


Recall  what  is  known  concerning  the  last  con¬ 
secutive  point  p  that  is  invariant  to  filtering. 
The  N  points  in  the  window  to  the  left  of  the 
center  point  p  are  all  less  than  or  equal  to  p  in 
value;  the  N  points  to  the  right  of  p  are  all 
greater  than  or  equal  to  p  in  value.  When  the 
next  point  pel,  is  centered  in  the  window  there 
will  be  at  least  N  points  less  than  or  equal  to  p 
in  value  and  at  least  Nel  points  greater  than  or 
equal  to  p  in  value.  Therefore  the  median  value 
can  not  oe  less  than  the  value  of  p.  For  conveni¬ 
ence  we  summarize  this  as  the  following. 

Obseryat ion  1:  The  first  point  to  change  value 
during  a  medTan  filtering  operation  must  be  on  the 
opposite  side  of  its  predecessor  than  the  most  re¬ 
cent  constant  neighborhood,  and  this  point  upon 
filtering  moves  toward  its  predecessor  but  does 
not  move  past  its  predecessor. 


Continuing  in  this  fashion,  consider  the  point 
following  pel;  that  is,  p+2.  Note  that  the  value 


of  p+2  is  greater  than  or  equal  to  the  value  of  p. 
As  the  window  is  incremented  to  the  right,  p+2  is 
centered  in  the  window  and  a  point  moves  out  of 
the  window  on  the  left.  A  new  point  enters  the 
window  on  the  right.  The  value  of  this  point  must 
be  either  greater  than  that  of  p  or  less  than  or 
equal  to  the  value  of  p.  If  it  is  less  than  or 
equal  to  the  value  of  p ,  then  there  are  at  least 
N-1  points  in  the  window  with  values  less  than  or 
equal  to  p  and  at  least  N+1  points  with  values 
greater  than  or  equal  to  p.  Consequently,  p+2  can 
not  be  filtered  to  a  value  less  than  p.  If  the 
value  of  the  new  point  is  greater  than  that  of  p, 
then  trivially,  the  filtered  value  of  p+2  can  not 
be  less  than  that  of  p.  The  same  reasoning  can  be 
applied  to  points  p+3,  p+4,  ...,  p+N.  For  con¬ 
venience,  we  summarise  this  as  the  following. 

Observation  2:  After  filtering,  the  N  rightmost 
points  in  the  window  centered  at  p  must  all  have 
values  equal  to  that  of  p  or  on  the  apposite  side 
of  the  value  of  p  than  the  most  recent  constant 
neighborhood. 

Consequently  the  value  of  p  is  always  invariant 
to  median  filtering,  and,  in  addition  the  same  ar¬ 
gument  applies  to  any  other  (invariant)  point  to 
the  left  of  p.  Also,  the  point  p+1  has  one  of  two 
possible  filtered  values,  as  follows. 

Observat ion  3:  Of  all  the  values  in  the  window 
centered  at  p+1,  the  filtered  value  of  p+1  is  ei¬ 
ther  the  value  of  p  or  the  closest  value  to  p  on 
the  opposite  side  as  the  most  recent  constant 
neighborhood. 


Thus  at  most  window  passes  are  required  to 

reduce  the  signal  to  a  root.  As  a  result  of  the 
previous  discussion  we  have  the  following  theorem 
for  an  L-length  signal. 

Theorem  2.  Upon  successive  median  filter  window 
passes  any  non-root  signal  will  become  a  root 

after  a  maximum  of  ^(L-2)  successive  filterings. 

Also,  any  non-root  signal  can  not  repeat,  and  the 
first  point  to  change  value  on  any  pass  of  the 
filter  window  will  remain  constant  upon  successive 
window  passes. 

To  illustrate  this  characteristic  of  median 
filtering  consider  the  binary  valued  L-8  signal  of 
Fig.  2.  This  signal  will  be  repeatedly  filtered 
by  use  of  a  window  length  of  3  samples.  The  ap¬ 
pended  constant  terms  are  marked  with  x's.  We  see 

that  3  3  window  passes  are  required  to 

reduce  this  signal  to  a  root 

•  •  •  •  X 

a  ,  m  m  Or  i  .final  Signal 

•  •  •  •  X 

X  •  •  •  •  After  On,'  t  I  Hr,  P...N 

•  •  •  •  X 

X  •  •  •  •  At  i «rr  Twn  Pj  .  « 

•  •  •  •  X 

*  *  *  *  *  Ml.  r  . .  . . 


3y  using  an  argument  similar  to  that  just 
presented  we  reason  that  the  filtered  values  of 
p+2  through  p*N  are  greater  than  or  equal  to  the 
filtered  value  of  p+1.  If  the  filtered  value  of 
p+1  IS  the  same  as  the  value  of  p,  then  point  p+1 
is  invariant  to  filtering  on  the  next  pass  of  the 
window  because  it  is  not  greater  than  the  value  of 
p.  Suppose,  however,  that  the  filtered  value  of 
point  p+1  is  greater  than  that  of  p.  We  must  re¬ 
examine  the  pre-filtered  point  values.  When  p+1 
is  in  window  center,  the  N+1  rightmost  points  must 
all  have  values  greater  than  that  of  p  including 
the  rightmost  point  p+N+1.  As  a  result,  when 
p+N  +  1  is  in  window  center,  the  leftmost  N+1  points 
have  values  greater  than  that  of  p  and  the  fil¬ 
tered  vjlue  of  p+N+1  must  be  greater  than  that  of 
p.  Consequently,  on  the  second  pass  of  the  win¬ 
dow,  after  all  the  points  have  been  filtered  once, 
when  point  p+1  is  in  window  center,  the  N  leftmost 
points  are  all  less  in  value  than  that  of  p+1,  and 
the  rightmost  N  points  all  have  values  greater 
than  or  equal  to  that  of  p+1.  Thus,  p+1  is  the 
median  of  the  window  and  does  not  change  value 
upon  the  second  filtering.  This  yields  the  fol¬ 
lowing. 

Observat ion  4:  The  first  point  to  change  value  on 
a  med i an  filtering  operation  remains  invariant 
upon  additional  filter  passes. 

When  the  observation  is  made  that  I  he  median 
filtering  operation  is  independent  of  whether  the 
window  moves  from  right  to  left  or  left  to  rigiit 
ai ross  the  signal,  we  see  that  the  properties  of 
the  first  point  to  change  value  apply  also  to  the 
last  point  m  the  signal  to  change  value.  Because 
of  the  appended  constant  valued  points  to  the 
front  and  back  of  the  L-length  signal,  the  first 
and  last  signal  points  are  invariant  to  filtering. 


A  Hunt 

fig.  2  Result  of  Repeated  Median  filtering 

To  this  point,  it  has  always  been  assumed  that 
the  signal  is  quantized  to  K  levels  for  an  L- 
length  signal  this  requirement  is  not  needed  be¬ 
cause  an  L-length  signal  can  have  at  most  L  dif¬ 
ferent  values  even  if  the  signal  samples  are  not 
quantized  to  specific  values.  Thus,  we  can  always 
bound  K  from  above  by  the  value  of  L  and  all 
results  stated  in  this  paper  apply  to  unquant  wed 
signals. 

III.  Discussion 

The  development  in  the  preceeding  section  sug¬ 
gests  a  number  of  interesting  results.  First,  we 
note  that  every  signal  in  the  space  of  signals,  s 

c  S,  can  be  filtered  to  a  unique  root  with  a 

bounded  number  of  repeated  filterings.  Thus,  the 
elements  of  the  root  set  Ru  partition  S  as  Ulus- 

trated  in  Fig.  3  where  it  is  shown  how  the  signal 

space 


Fig.  3.  Partition  of  the  Signal  Space  S  bv 
Sight  Roots. 


4 


is  partitioned  by  a  root  set  with  eight  elements, 
where  upon  repeated  filtering  every  signal  s  c  Sj 

is  filtered  to  not  r^  t  R^  and  so  on;  we  will 
call  5.  the  ancestor  set  of  root  r..  If  a  signal 
s  requires  l  filter  passes  to  reach  the  root  r j  we 
say  that  s  is  an  l-th  generation  ancestor  of  r^. 
Me  know  from  Theorem  2  that  any  root  has  at  most 
j(L-2)  ancestral  generations  and  we  know  that  the 

root  of  a  signal  depends  on  the  filter  window 
sire,  i.e.,  a  root  for  a  window  of  size  3  may  not 
be  a  root  for  a  window  of  size  5,  although  a  root 
for  a  size  5  window  is  always  a  root  for  a  size  3 
window.  In  a  loose  sense,  median  filters  are  a 
type  of  lowpass  filter  with  an  increasingly  narrow 
passband  as  the  window  size  increases. 

The  application  of  median  filtering  to  signal 
smoothing  problems  introduces  an  interesting  twist 
to  the  concepts  of  signal  and  noise.  A  median 
filter  has  no  design  parameters  other  than  window 
size.  It  can  not  be  designed  to  accomodate  spe¬ 
cial  signal  or  noise  characteristics.  In  the  ex¬ 
treme  case  a  filter  can  completely  remove  a  signal 
component  leaving  only  noise.  It  seems  desirable 
that  a  noise-free  signal  be  a  root  signal  in  order 
that  it  is  invariant  to  median  filtering.  If  the 
root  signal  has  added  noise,  then  it  may  or  may 
not  be  possible  to  remove  the  noise  by  filtering. 
Noise  that  can  be  filtered  is  noise  that  changes 
the  signal  in  such  a  way  that  the  noisy  signal  is 
an  ancestor  of  the  same  root.  This  noise  can  be 
removed  with  repeated  filtering.  However,  if  the 
noisy  signal  is  now  the  ancestor  of  a  different 
root,  then  it  can  not  be  removed  by  repeated 
filtering.  This  property  of  either  perfect  signal 
recovery  or  false  signal  recovery  points  to  yet 
another  application  of  median  filtering  -  channel 


coding.  For  this  application  the  root  set  R 
corresponds  to  an  alphabet  set.  The  transmitted 
code  can  contain  either  roots  or  ancestors.  In 
either  case  decoding  is  accomplished  through  re¬ 
peated  filtering. 

In  this  paper  we  have  established  several  fun¬ 
damental  theoretical  properties  of  median  filters. 
Me  have  presented  necessary  and  sufficient  condi¬ 
tions  for  a  signal  to  be  invariant  to  median 
filtering  and  we  call  these  signals  roots  of  the 
filter.  Me  have  also  shown  that  repeated  filter¬ 
ing  of  any  signal  results  in  a  root  signal  and 
have  established  the  maximum  number  of  filtering 
operations  required  to  reach  a  root.  As  a  result 
of  the  theory  developed  in  this  paper  a  better 
understanding  of  the  potential  applications  as 
well  as  the  limitations  of  these  filters  is 
achieved. 


References 


1.  T.  S.  Huang,  S.  J.  Tang,  and  G.  T.  Tang,  "A 
Fast  Two  Dimensional  Median  Filtering  Algo¬ 
rithm”,  IEEE  Trans.  Acoust.,  Speech,  Signal 
Processing,  Vol.  ASSP-27,  pp.  13-18,  Feb. 
1979. 


2.  N.  S.  Jayant,  "Average-  and  Median-Based 
Smoothing  Techniques  for  Improving  Digital 
Speech  Quality  in  the  Presence  of  Transmission 
Errors",  IEEE  Trans.  Commun.,  Vol.  COM-24,  pp. 
1:43-1045,  Sept.  1976. 


3. 


L.  R.  Rabiner,  H.  R. 
Schmidt,  "Applications  of 
ing  Algorithm  to  Speech 


Sambus,  and  C.  E. 
a  Nonlinear  Smooth- 
Processing",  IEEE 
Signal  Processing, 
DecTT97TI - 


ROOT-SIGNAL  SET  ANALYSIS  FOR  MEDIAN  FILTERS 


G.  R.  Arce  and  N.  C.  Gallagher,  Jr. 

School  of  Electrical  Engineering 

Purdue  Uni  vers i ty 

West  Lafayette,  Indiana  1(7907 

ABSTRACT 

Invariant  signals  to  median  filters  are  called  roots  of  the  signal. 

A  tree  structure  for  the  roots  of  a  binary  signal  Is  obtained.  Showing  a 
state  propagation  property  from  which  a  state  diagram  is  obtained.  The  num¬ 
ber  of  roots  R(n)  for  a  signal  of  length  n  and  a  window  filter  2*s-l  is  ex¬ 
actly  represented  by  the  difference  equation  R(n)»R(n-l)  +  R(n-s).  A  general 
solution  is  obtained  in  a  Z  domain  approach,  and  in  a  transformation  ap¬ 
proach. 


SUMMARY 


Many  properties  of  a  median  filter  may  be  described  in  terms  of  the 
so  called  root  signals.  A  signal  invariant  to  the  filter  is  called  a  root. 

In  this  paper,  a  tree  structure  of  the  roots  is  modeled  and  implemen¬ 
ted  graphically.  This  structure  has  very  attractive  properties  such  as  sym¬ 
metry  as  well  as  a  predictable  pattern  of  state  propagation.  Each  state  in 
the  tree  generates  other  states,  not  necessarily  of  the  same  kind;  then,  the 
new  states  generate  another  group  of  states  and  so  the  tree  structure 
follows.  The  repetition  of  states  in  a  tree  is  a  function  of  the  length  of 
the  signal,  and  the  number  of  different  kinds  of  states  is  a  function  of  the 
filter  window  size.  At  each  stage  along  the  tree,  each  state  yields  a  num¬ 
ber  of  roots. 

On  the  binary  signal  we  obtain  4  different  states,  states  A  &  D  yield 
2  roots,  and  states  B  6  C  yield  1  root  each.  The  relation  for  the  number  of 
roots  is:  R(n+1)  «  2*(A(n)+D(n) )  ♦  B(n)  +  C(n),  where  n  represents  the  sig¬ 
nal  length.  For  the  binary  case  the  difference  equation  for  R(n)  can  be 
shown  to  be:  R(n+s)-R(n+s- I )  R(n)  ,  where  s  depends  on  the  window  size.  The 
solution  of  the  difference  equation  is  obtained  with  a  state  equation  appro¬ 
ach.  Let:  R(k)  -XI (k) ;R(k+l)«X2(k) ;R(k+2)-X3(k) . R(k+s-l )-Xs(k) .  By  sol¬ 

ving  the  vector  state  equati  on  £(k+l )•[£]**  x_(k)  we  obtain  the  solution: 
R(k)-[  I  0  0  0  ...  0 1 X (k) .  Therefore  a  ToluTion  to  [A]*1  is  necessary,  where 
the  A  matrix  has  the  Torm  of  a  bottom  companion  matrix1.  The  characteristic 
polynomial  for  the  A  matrix  of  size  s  by  s  is:  f(X)-Xs  -  X*"l  -1.  Using 
Sturm's  theorem,  we  can  see  that  the  characteristic  function  has  distinct 
eignevalues  only.  Two  different  approaches  are  used  to  obtain  A k.  One  ap¬ 
proach  used  the  Z  domain,  A*<  ■  Z"*{(zl-A)"  z},  the  other  approach  uses  a 

similarity  trans  format  i  on  :X-MQ  where  Ak  -  M  Dj*  JC  *  and  j)  -  A  £ .  A 

closed  form  solution  is  then  obtainecTshowTng  that  the  number  of'Yobts  for  a 
signal  of  length  k  is  a  linear  combination  of  the  eigenvalues  raised  to  the 
kth  power.  The  Z  domain  approach  yields  the  result: 


R(k0) 


Z-HO 


1 

-Zd 

k 

°  1 

ZS"’(Z-I),ZS"2(Z-I) . Z(Z-I),Z 

*o 

dZ 

zs-zs-'-i 

X(o) 


where  k  is  a  specific  signal  length.  In  the  paper  we  analyze  in  detail  ev¬ 
ery  point  touched  in  this  summary. 

The  authors  gratefully  acknowledge  the  support  of  the  Air  Force  Office 
of  Scientific  Research  under  grant  AFOSR  78-3605. 

Pae^ented  at  the  Eighteenth  Annual  Attention  Convenience  on  Cormunication, 
Control,  and  Computing,  October  8-10,  1980.  * 


610 


IEEE  TRANSACTIONS  ON  INFORMATION  THEORY.  VOL.  IT-26,  NO.  S,  SEPTEMBER  1980 


Some  Properties  of  Uniform  Step  Size  Quantizers 

JAMES  A.  BUCKLEW,  member,  ieee.  and  NEAL  C 
GALLAGHER,  JR.,  member,  ieee 

Abstract — Snow  properties  ot  ike  nptlil  hr  minrr  error  uniform 
quaatizcr  are  treated.  Il  it  shows  that  the  bmibb  tgnare  error  (aae)  it  gives 
by  Ibe  input  variance  admit  the  output  variance.  Furthermore 
lim*  >  I,  where  N  it  the  number  of  output  levels  and  A  (a 

function  of  M)  it  the  step  site  of  the  uniform  quantizer,  with  equality 
when  the  support  of  the  random  ■’triable  It  r—tmimrA  In  a  finite  interval.  A 
dam  of  probability  densities  is  given  for  whkh  the  above  limit  It  pester 
than  one.  Il  it  shown  that  lunw_a,A2  nue-(b-t>)1/l2,  where  ( b-a )  It 
the  measure  of  the  smallett  interval  that  coauiaa  the  support  of  the  input 
random  variable. 

In  many  problems  ansing  in  the  evaluation  or  design  of  a 
control  or  communication  system  it  is  necessary  to  predict  the 
performance  of  a  uniform  quantizer.  Uniform  quantizers  are  of 
interest  because  they  are  usually  the  simplest  to  implement  and 
because  many  noise  processes  in  physical  systems  may  be  con¬ 
sidered  as  the  noise  produced  by  a  uniform  quantizing  opera¬ 
tion.  For  example,  the  final  position  of  a  stepping  motor  or  the 
line  drawn  by  the  pen  of  a  computer  plotting  device  under  a 
continuous  control  may  be  considered  to  be  corrupted  by  a 
uniform  quantizing  operation. 

Because  of  the  importance  of  these  quantizers  several  authors 
have  considered  their  properties.  Widrow  (I)  shows  that  under 
certain  conditions  the  quantization  noise  is  uniformly  distrib¬ 
uted.  Gish  and  Pierce  (2]  show  that  asymptotically  the  uniform 
quantizer  is  optimum  in  the  sense  of  minimizing  the  output 
entropy  subject  to  a  fixed  mean-square  error.  Morris  and 
Vandclindc  |3|  show  the  uniform  quantizer  to  be  minimax. 
Sripad  and  Snyder  [4|  later  extended  Widrow's  work  to  give  a 
sufficient  condition  for  the  quantization  error  to  be  uniform  and 
s  uncorrelated  with  the  input. 

We  now  prove  some  additional  properties  of  these  quantizers 
when  they  are  designed  to  minimize  the  mean-square  error 
(mse).  We  may  write  down  the  analytic  expression  for  the 
quantizer  characteristic  g(x)  as 

a,  if*  <9. 

a  +  (i+l)A,  if  q  +  iA<x  <q  +  {i  +  1)A, 

fori-0. -■•./V-3  1  ’ 

«i  +  (N-l)A.  if  x>(/V-2)A  +  q, 

where  N  is  the  number  of  output  levels.  We  see  that  if  x  is  less 
than  q  or  greater  than  q  +  (N  -  2)A,  x  is  truncated  to  a  or 
<i  +  ( N  -  I  )A.  respectively.  An  important  parameter  of  interest  is 
the  measure  of  the  nontruncation  region,  ( N  -  2)A. 

The  quantizer  characteristic  (g(x)  must  be  optimized  with 
respect  to  three  parameters,  q  which  fixes  its  position  along  the  x 
axis,  a  which  fixes  its  position  along  the  y  axis,  and  A  (a  function 
of  N )  which  specifies  the  step  size  of  the  quantizer.  Because  il 
makes  little  sense  to  speak  of  minimizing  the  mean-square  error 
of  a  random  vanable  with  infinite  variance,  we  will  always 
assume  /?„xJ/(x)  dx  <  oe. 

Property  I :  The  minimum  mean-square  error  uniform  quan¬ 
tizer  preserves  the  mean  of  the  input  random  variable. 


Proof:  Suppose  g(x)  is  the  optimum  uniform  quantizer. 
Then 

~f(x-g(x)  +  t)}/(x)dx  Uo-0,  (2) 

which  implies 

/  x/(x)4lx-Jg(x)/[x)dx.  (3) 

□ 

Property  2:  For  the  optimum  uniform  quantizer 
a-9-A/2. 

Proof:  Suppose  g(x)  is  the  optimum  uniform  quantizer. 
Then 

0“^f(g(x-t)-xff(x)dx\..o  (4) 

m4:  2  (<*  +  (/+  I)A)2  f*+,M'*1)4/(x)<fx  +  a2  f"*  /(x)dx 

1.0  +  t  +  J- oo 

+  (a  +  (N-  1)A)2  f(x)dx 

•'f  +  e  +  ftf  ~2)A 

-2  2!  («  +  («  +  1)A)  f1*'*^*  l^xf(x)  dx  +  a  f*  xf(x)dx 

j-0  +  i  +  •'-» 

+  (a  +  (!V-l)A)/“  xf(x)dx  |..0  (5) 

2,(fl  +  (‘+«)A)J[/(9  +  €  +  («+l)A)-/(9  +  €+iA)) 

1-0 

+  a^9+«)-(a  +  (/V-l)A),/(9+«  +  (/V-2)A) 

■2[T-o3(fl+(,  +  l)A),u+<+(,  +  ,)A)/(’+t+(,+  l)A) 

-  (q  +  «  +  »A)/(4  + 1  +  »'A)J  +  a(q  +  t)J{q  +  *) 

-  (a  4-  ( N  -  1  )A)(,  +  c  +  (AT  -  2)A)/(9+ c  +  (N  - 2)A) ]|t-0. 

(6) 

Simplifying  this  expression  we  obuun 

N-I 

(A  +  2a-29)  2)  f(q+i&)-0. 
i-o 

The  solution  £?_~oVl9  +  iA)wO  corresponds  to  a  trivial  solution 
because  without  affecting  the  mean-square  error,  we  may  always 
arbitrarily  set /fq  +  iA) —0, i — 0, •  •  •  ,N-2.  Hence  A+2a-2q-0 
which  is  what  we  wish  to  prove.  □ 

Property  3:  The  mean-square  error  of  an  optimum  uniform 
quantizer  is  given  by  the  input  variance  minus  the  output  vari¬ 
ance. 


Manuscript  received  April  23.  1979;  revised  October  25.  1979.  This  work 
w«*  hupporled  by  the  Air  Force  Office  of  Scientific  Research  under  Grant 
AIOSR  78-3605.  This  paper  was  presented  at  the  1979  Allcrton  Conference 
on  Information  Sciences  and  Systems,  Monticello,  IL.  October  10  12,  1979. 

J  A  Bucklew  was  with  the  School  of  Electrical  Engineering,  Purdue 
University.  West  Lafayette.  IN.  He  is  now  with  the  Elec  in  cal  and  Computer 
I  ngincenng  Department,  University  of  Wisconsin.  Madison.  Wl  53706. 

N  C.  Gallagher,  Jr  is  with  the  School  of  Electrical  Engineering.  Purdue 
University.  West  Lafayette,  IN  47907. 


Proof: 

true~E(g(x)-x)1 

-£{x2}-2£{*g(x))  +  £{g(x)2}.  (7) 

We  wtth  to  optimize  this  expression  with  respect  to  A.  Using 


0018-9448/80/0900-06 10100.75  ©1980  IEEE 


IEEE  TRANSACTIONS  ON  INFORMATION  THEORY.  VOt-  IT-26.  NO.  5.  SH'TFMRHR  1980 


61 


a -9 -A/2  we  first  obtain 

E{xg(x))-  2  (,J  +  (,+  l)A)  ‘  ')ikxJ{x)dx 

*  (4  -  A/2)  J_*  xfljr)  dx  +  (q  +  (  N  -  5  )a) 

•  f°°  xAx)  dx  (8) 

and 

r  Ax)dx.  (9) 

-^♦ov-jja 

Substitute  (9)  and  (10)  into  (8);  take  the  partial  derivative  with 
respect  to  A  and  set  the  result  equal  to  zero.  We  find  that 

E{xg{x))  +  qE {  «(-*)}  —  £"{  *(jr)2}  -6-^£T{Ar).  (10) 

but  £{g(x)}  “  £(x)  for  the  optimum  quantizer.  Hence 
E{xg{x))  + E[gixy}  and 

mse -£{■**}-£{  *(*)*}  (I!) 

which  together  with  Property  1  completes  the  proof.  Q 

Sn pad  and  Snyder  [4]  show  that  a  sufficient  condition  for 
x  -  g(x)  to  be  uniform  and  un correlated  with  x  is 

♦*(¥)-♦*( *■»*-*«. *2.—.  (i2) 

where  $„(«)  is  the  characteristic  function  of  the  input  random 
variable  *  and  ^(u)  -  d+,{u)/du.  Frequently  in  the  analysis  of 
a  system  corrupted  by  a  uniform  quantizing  operation  it  is 
assumed  that  the  quantization  noise  is  uncorrelated  with  (or 
sometimes  independent  of)  the  input.  The  next  property  demon¬ 
strates  that  this  cannot  be  done  with  the  optimum  uniform 
quantizer. 

Property  4:  Suppose  the  input  probability  density  is 
Riemann-integrable.  Then  the  quantization  noise  is  never  uncor¬ 
related  with  the  input  tor  the  optimum  uniform  quantizer. 

ProoJ.  Without  loss  of  generality  assume  E{X]  -  0. 
Suppose  the  converse  holds.  This  implies 

£{(-r-8(x))x}-£{*,)-£(g(x)x)-0.  (13) 

but  from  Property  3 

£{-**(■*)}  -£{g(x)s). 

£{*l}-£{g(*),}-0.  (14) 

But.  again  from  Property  3,  the  left  side  of  (14)  is  the  mean- 
square  error.  This  is  a  contradiction,  since  a  Riemann-integrable 
probability  density  function  necessarily  implies  that  the  mean- 
square  error  for  any  finite  number  of  output  levels  is  greater 
than  zero  (i.e.,  ffx)  has  no  delta  functions).  □ 

We  now  state  an  obvious  property  which  will  be  used  in 
several  subsequent  proofs. 

Property  5;  The  mean-square  error  for  the  optimal  uniform 
quantizer  approaches  zero  as  the  number  of  output  levels  ap¬ 
proaches  infinity. 

Proof:  The  mean-square  error  is  given  by  £{(>?(  tj-x)1), 
and  f»r  this  to  approach  zero  it  is  sufficient  that  g(x)  approach  x 
in  mean-square.  Consider  a  quantizer  with  the  parameters 
A—  1  /  V~S  -2  and  q  *■  -(N  -  2)A/2.  The  width  of  the  non¬ 
truncation  region  is  (Af-2)A-  VJt-2  .  Hence  as  N  becomes 
large  the  width  of  the  nontruncation  region  approaches  infinity 


and  della  approaches  zero  It  is  a  simple  matter  n>  \hi>w  that 
l""\  ..  «(  *•-  i  rvrivwllcir.  Siikc  <x<s|  i|\/i  V  and 

.1  ’«< '  +  A;)/(  x)  dx  <  oo,  this  implies 


(g(  *)-  x2)/(x)dx' 


/  lim  (g(x)~  x)‘f(x)dx~0 

*  —  OO  *0D 


by  the  l^besgue  dominated  convergence  theorem.  ITiis  quan¬ 
tizer  is  in  general  suboptimal,  which  implies  that  an  optimal 
quantizer  must  have  even  smaller  mean-square  error  for  each  N, 
and  hence  its  error  must  also  go  to  zero.  q 

As  a  consequence  of  the  above  property,  it  is  easy  to  show 
lim._a,A“0  for  the  optimal  uniform  quantizer. 

Let  (a.b)  be  the  smallest  interval  such  that  fif(x)  dx  —  l 
Note  that  either  |aj  or  jb|  may  be  infinite. 

Property  6.  Suppose /(x)  is  Riemann-integrable.  Then,  for  the 
optimum  uniform  quantizer.  limw_tc(W-2)A»h-o. 

Proof:  Suppose  limN_0O(/v'  -  2)A  <6  -  a.  This  implies  that 
for  N  sufficiently  large  we  are  always  truncating  some  finite 
amount  of  probability  mass,  and  so  the  mean-square  error 
cannot  go  to  zero.  This  contradicts  the  previous  property.  Hence 
\itnN_m{N -l)\>b-a. 

Suppose  limw_ao(V-2)A>A-fl.  This  makes  sense  only  if  the 
random  variable  is  of  finite  support.  So  for  N  large  enough  there 
is  no  truncation  error.  In  the  Appendix  it  is  shown  that  for  a 
family  of  quantizers  with  no  truncation  error  lim/v_»mse/ 
(AJ/I2)«  I  for  a  Riemann-integrable  density  function.  So,  for  N 
sufficiently  large,  (N -2)A>C  >b~ a <  oo.  Then 

v— oo  AJ/12  CJ/!2(N-2) 


-  r1 

lim  (N-2)1m*e>  (15) 

W-»oo  IZ 

Consider  a  suboptimal  quantizer  whose  input  intervals  are  ob¬ 
tained  by  dividing  the  interval  (o.f>)  into  N-  2  equal  subinter¬ 
vals.  Denote  the  mean-square  error  of  this  quantizer  by  mseSUB 
and  its  step  size  by  As  — (b-a)/(N-2).  This  quantizer  has  no 
truncation  error  and  hence 


i-  lim 


mS*SUR 


"-<*>  AJ/12  w-oo  (b  -  ay / \2(N  -  2) 


lim  (N-2)?mseslJ,-  <~<  lim  ()V-2)Jmse.  (16) 

N  —  ao  IZ  lz  N_® 


which  is  a  contradiction  since  we  have  found  a  suboptimal 
quantizer  with  a  belter  mean-square  error  than  the  optimal  one. 

□ 

Bennett  |5)  shows  that  the  mean-square  error  of  a  uniform 
quantizer  is  approximately  A1/ 12.  assuming  that  the  truncation 
error  is  negligible.  This  is  not  always  the  case  and  in  the 
discussion  we  will  give  examples  for  which  Bennett’s  approxima¬ 
tion  may  be  very  poor  indeed.  There  are  some  special  cases 
where  Bennett’s  approximation  does  hold.  The  next  properly 
deals  with  one  such  case. 

Property  7:  Suppose  the  density  function  is  Riemann-iniegra- 
ble  and  b-a<oo.  Then  for  the  optimal  uniform  quantizer  we 
have 


lim 


mse 

AJ/!2 


I 


Proof:  From  Property  6  limw_ae(<V  -  2)Ao  m  b  -  o  <;  oo 
where  Aq  is  the  optimum  A.  We  may  design  a  suboplimum 
quantizer  by  dividing  the  interval  (a,b)  into  N -2  equal  subin¬ 
tervals  and  using  these  subintervals  as  the  breakpoints  for  our 
quantizer.  We  denote  the  mean-square  error  associated  with  this 
quantizer  by  mseSi;B  and  the  step  size  by  A5  -  (6-<z)/(,V  -2). 
This  quantizer  has  no  truncation  error.  Hence  from  the  Appen- 


IEEE  TRANSACTIONS  ON  INFORMATION  THEORY,  VOL.  IT-26,  NO.  S,  SC PTEMMR  1980 


8.000-| - 


mseSUB 


.  A,  (N-2  )AS 

lim  -r-  -  hm  — — — — 
N—ao  “0  iV— *oo  (Af  2)Ao 


lim  (A/-2)AS 

V-.00 _ 

lim  (A' -2)Ao 

Af-*o© 


implying  limN_.<MA|/Ao“  I  For  any  quantizer  whose  nontrunca¬ 
tion  region  covers  the  support  of  the  Riemann-integrable  density 
function  in  the  limit  as  N  approaches  infinity,  we  show  in  the 
Appendix  that  limA,_o0mse/(A'/ 12)  >  1.  This  bound  is  arrived  at 
by  ignoring  the  truncation  error  and  is  true  for  density  functions 
with  finite  or  infinite  support.  Then 


mscSUB 

AS/ 12 


-  „m 

n-«\  Aj/12  /\  AS/12  ) 

J  Um  HZ2!l)(  lim  ^ 

A|/I2  AS/ 


I-  lim  >  iim 

N— «  Ao/12  N-.oo 


®8eoFT1MAL 


lim 


which  is  what  we  wanted  to  prove.  □ 

In  the  above  property  we  have  shown  that  the  truncation  error 
is  negligible  for  the  optimum  uniform  quantizer,  if  the  density 
function  has  finite  support.  This  is  not  true,  however,  for  arbi¬ 
trary  uniform  quantizers  on  these  densities.  It  is  easy  to  design  a 
sequence  of  uniform  quantizers  (indexed  by  N)  such  that 
l‘m/v-«mse_°.  limw_aoA  — 0  but  limA,_«,mse/(Al/l2)Y*  1. 
Zador  (6)  shows  that  if  f( x)  is  Riemann-integrable  and 
<  ao  for  some  6  >0  then  for  the  optimal  nonuniform 

quantizer 

lim  /VJ  mse- 11/11,^/12 

W—oo 

where  ||/||,/)  is  the  Lin  norm.  This  result  show  that  for  the 
nonuniforra  quantizer  the  mean-square  error  decreases  like 
1  / N*  for  large  N.  Is  there  a  similar  property  for  the  optimum 
uniform  quantizer?  Not  always. 

Properly  8:  Suppose  fix)  is  Riemann-integrable.  Then  for  the 
optimum  uniform  quantizer  lunw_»N,-mse“(t>  -  a)1/ 12. 

Proof:  If  b  -  «  <  00  then 

.  mse  (Af-2)lmse 

*-—>  A1/ 12  (Af~2)^A^/l2 


ltm(Af-2)>mie 
limAf  JA*/I2 


but  <  Al  -  2),A,-*oo  which  implies  lim»_„(Af-2)Jmse-»oo. 

If  b-a<  oo  then  hmw_-mse/(AV12)"  I  or  lim*^„(iV - 
2)Imsc  -  lim/»_„A,mse  —  <I2)“ '  -  lim (N  -  2)JAJ  -  (A  -  a)1/ 12 
which  completes  the  proof.  □ 

Discussion 

We  should  note  that  not  everyone  uses  our  definition  of  the 
optimum  uniform  quantizer.  For  example,  Pearlman  and  Senge 
(7)  have  published  tables  of  the  optimal  uniform  Rayleigh  quan¬ 
tizer.  For  their  computations  they  add  the  constraints  a -0  and 
♦  “A/2. 


1.000  t.000  5.000  1.000  6.000  8.000 

log)0(N)  - 

Fig.  I.  K  (solid  line)  end  D(N)  (dished  line)  ptoiied  as  *  function  of 
lofcrfW). 

It  is  interesting  to  note  that  Properties  I  and  3  are  also  shared 
by  the  optimal  nonuniform  quantizer  as  shown  in  [8].  As  a 
further  consequence  of  these  two  properties  we  find  that,  for  the 
Af  —  2  case,  the  optimum  uniform  quantizer  and  the  optimum 
nonuniform  quantizer  are  identical. 

Property  7  is  one  of  the  more  interesting  properties  proved  in 
this  correspondence.  A  common  approximation  to  the  mean- 
square  error  of  a  uniform  quantizer  has  been  A1/ 12.  Consider 
the  clam  of  density  functions  given  by 

K) 

We  easily  see  that  4»Sup(<:  /  x1*’f(x)  dx  <  oo}.  By  straight¬ 
forward  minimization  techniques  one  can  show  for  this  class  of 
densities  that 

..  mse  ,  .  2 

lim  — - -1+1. 

A1/ 12  ® 

Property  8  is  of  interest  because  it  sets  forth  a  basic  difference 
between  uniform  and  nonuniform  quantizers.  For  the  nonuni¬ 
form  quantizer  we  can  expect  the  mean-square  error  to  be  of  the 
order  of  1/A/1.  We  can  expect  this  rate  of  convergence  to  zero  to 
bold  for  the  uniform  quantizer  only  if  the  probability  density  has 
finite  support.  As  an  example  consider  the  Gaussian  case.  The 
Gaussian  probability  density  is  of  infinite  support  yet  has  ex¬ 
tremely  light  tails.  We  may  write  down  an  expression  for  the 
mean-square  error  of  a  Gaussian  random  variable  and  solve  for 
the  optimum  A  for  a  specific  N.  Let  us  set  A»2«K/(Af-2) 
where  AT  is  a  function  of  N  and  *  is  the  standard  deviation.  We 
find  that,  for  large  A',  A  is  given  by  the  following  transcendental 
equation: 


)’"*(' 


K1 

N- 2)  / 


1 


ins  transactions  on  information  theory,  vol.  it-26,  no.  5,  seftemrer  1980 


This  equation  may  be  solved  on  a  computer  by  a  standard 
Newton  Raphson  search.  In  Fig.  1  plot  K  as  a  function  of  N  for 
values  of  N  from  10  to  1000000.  The  dotted  line  is  put  in  as  a 
reference  and  is  given  by  D{N)<"  l.7in36A//ir.  It  can  be  shown 
that  ^rnN_xD{N)/ K <  so.  We  conclude  that  the  mean-square 
error  in  a  uniform  Gaussian  quantizer  is  of  the  same  or  larger 
order  than  (In  N)/ 


Appendix 

Consider  a  sequence  of  quantizers  (  £*(*)}*•  i.  where  N  is  the 
number  of  output  levels,  A*  is  the  step  size,  and  lN  is  the 
nontruncation  region  of  #*(*).  The  measure  of  lN  is  (V-2JA*. 
Suppose  the  input  probability  density  function  f(x)  is  Riemann- 
integrable,  and  denote  the  support  of  f(x)  by  supp  /.  Define 
use*  “  £■(<-*  —  *>v<-*'))2} 

Lemma  1:  Suppose  /w-»supp  /  as  N-» oo  (i.e..  if  xGsupp  J 
then  there  exists  an  N„  such  that  r6/,  for  n  and 

hm)V_,„A„-0.  Then  hm*  .^mse/y/fAj,/! 2)>  I.  furthermore  if 
supp  f  C  lN  for  all  N  and  ltm^_a)A/v  -0  then 
limw..0Omsew/(A^/12)-  I. 


Proof:  Define 


-  sop  /(•*) 

« +  *♦<<■►  l)A„) 

ex 

mt  -  inf f(x) 

*♦<!♦  i)A») 


Then 


,?0  -M,+ i  W)1*  < 

>d 


wlicic  lliM  is  the  truncation  error.  Thu* 

Al  «-3  Ai  X-) 

■yy  2  »i(ij(  <  msey  <  -jr  2 
i-0  <>o 

If  /N-»supp  /  as  N—aa  and  lim*  ..pA* —0  then,  since  fl.x)  is 
Riemann-integrable,  lim*  .ao2j'j’0’m(Aw-*l,  which  proves  the 
first  part  of  the  lemma.  If  supp /C  /«  for  every  N  then  TE*  —  0 
for  every  N.  and  since  limw_aoA/v -0  and  /(*)  is  Riemann-inte¬ 
grable.  again  I,  which  proves  the  second  part  of 

the  lemma.  □ 


References 

1 1 J  B  WiiJrow.  “Statistical  analysis  of  amplitude  quantized  sampled  data 
systems,*1  Trans.  A!  EE  Applications  and  Industry,  pi  II.  vol.  79.  pp. 
555-  568.  Jan.  I960. 

|2|  H  Cith  and  J  N  Pierce.  “Asymptotically  efficient  quantizing."  IEEE 
Tram  Inform.  Theory,  vol  IT*  14,  pp.  676  (43,  Sept.  1968 

|3|  J  M  Morns  and  V  I)  Vandelmdc.  "Robust  quantization  of  discrete* time 
signals  with  independent  samples."  IEEE  Trans.  Common  ,  vol.  COM-22, 
no  12.  pp  1897  - 1901.  Dec  1974. 

(4)  A  H  Snpad  and  D.  L.  Snyder.  “A  necessary  and  sufficient  condition  for 
quantization  errors  to  be  uniform  and  white,"  IEEE  Trans.  Acoustics, 
Speech,  and  Signal  Processing,  vol.  ASSP*25,  pp.  442-448,  Oct  1 977. 

|5|  W  R  Bennett.  "Spectra  «  uantized  signals."  BeU  Syst  Tech  J ,  vol. 
27.  pp.  446  472.  1948 

1 6|  p  Zador,  “Development  and  evaluation  of  procedures  for  quantizing 
multivariate  distribution*."  Ph  D.  dissertation.  Stanford  Univ..  Stanford. 
C*A.  1964 

1 W  A  Pearlman  and  G  If  Senge.  "Optimal  quantization  of  the  Rayleigh 
probability  distribution.*'  IEEE  Trans.  Common .  vol  C’OM-27.  pp 
101  112.  Jan  1979. 

|H|  J  A  Bucklew  and  N  C  Gallagher,  Jr  .  “A  note  on  optimum  quantiza¬ 
tion,"  IEEE  Trans  Inform  Theory,  vol.  IT-25,  pp.  365  366,  May  1979. 


SOME  MODIFICATIONS  TO  THE  MEDIAN  FILTER  PROCESS  AND  THEIR  PROPERTIESt 


T.  A.  NODES  AND  N.  C.  GALLAGHER,  JR. 
School  of  Electrical  Engineering 
Purdue  University 
W.  Lafayette,  IN  47907 


ABSTRACT 

Some  modifications  of  the  median  filter  are  given  and  their  properties 
are  derived.  In  addition,  some  results  for  standard  median  filters  are 
given.  It  is  shown  that  for  non  median  nth  ranked-order  operations,  re¬ 
peated  application  of  the  operation  will  reduce  any  signal  to  a  constant. 
Also,  it  is  proved  that  the  output  of  a  recursive  median  filter  is  in- 
varient  to  subsequence  passes  by  the  same  filter. 

I.  INTRODUCTION 

Median  filtering,  a  method  of  signal  processing  which  is  easily  imple¬ 
mented  on  a  digital  computer,  has  been  used  with  success  in  many  applica¬ 
tions.  These  applications  include  picture  processing  and  speech  process- 

i  ?  3  4 

ing''  *  *  where  it  is  employed  to  smooth  the  signal.  Further  potentially 
useful  properties  can  be  obtained  from  slight  modifications  of  the  median 
process.  We  have  investigated  several  such  modifications  and  present  the 
properties  of  two  of  them.  In  section  II,  we  look  at  the  nth  ranked-order 
operation,  which  is  a  generalization  of  the  median  process.  In  section 
III,  we  study  the  recursive  median  operation,  which  incorporates  previous 
output  values  into  the  median  decision  process.  Finally,  in  section  IV  we 
introduce  some  other  possible  modifications  to  median  filters.  First,  how¬ 
ever,  a  review  of  the  standard  median  filter  is  in  order. 

Median  filtering  is  a  discrete  time  process  in  which  a  2N+1  points  wide 
window  is  stepped  across  an  input  signal  (see  Fig.  1).  At  each  step,  the 
points  inside  the  window  are  ranked  according  to  their  values,  and  the 
median  value  (mid-point)  of  the  ranked  set  is  taken  as  the  output  value  of 
the  filter  for  each  window  position.  At  both  ends  of  the  signal,  N  end 
points  are  appended  to  allow  the  filter  to  reach  the  edges  of  the  signal. 


The  output  of  the  median  filter,  Y(A)  is  given  by 

Y(A)  ■  the  median  value  of  {x(A-N) ,. . . ,x(A-l ) ,x(A) ,x(A+l ) ,. . . ,x(A+N) } 
"*  Figure  1:  The  Median  Filter  ''  . 


tThe  authors  gratefully  acknowledge  the  support  of  the  Air  Force  Office  of 
Scientific  Research  under  Grant  AFOSR  78-3605. 


Pnetented  at  t he  Eighteenth  Annual  AtteMon 
Control  and  Computing,  OctobeA  8-10,1980. 


Convenience  on  Conmnicatione, , 


Figure  2:  Effects  of  window  size  on  a  median  filtered  signal 

The  value  of  the  front  endpoints  is  equal  to  the  value  of  the  first  point 
of  the  signal,  and  the  value  of  the  rear  endpoints  is  equal  to  last  point 
of  the  signal.  As  an  exaeple  of  this  process,  consider  Fig.  2.  Here,  a 
binary  signal  of  length  eleven  (them's  represent  the  appended  endpoints) 
is  median  filtered  by  three  different  window  widths  N  *  1  (2N+1=3>,  N  =  2 
(2N+1=5),  and  N  *  3  (2N+1=7).  Notice,  for  the  N=1  case,  the  signal  is  un¬ 
perturbed,  while  for  the  N=2  and  N*3  cases,  the  amount  of  structure  in  the 
signal  is  reduced.  A  number  of  signal  structures  which  can  be  used  to  de¬ 
fine  the  properties  of  median  filters,  can  now  be  defined. 

A  constant  neighborhood  is  a  region  of  at  least  N+1  consecutive  points 
all  of  which  are  identically  valued. 

An  edge  is  a  monotonically  rising  or  falling  set  of  points  surrounded 
on  both  sides  by  constant  neighborhoods. 

An  impulse  is  a  set  of  N  or  less  points  whose  values  are  different 
from  the  surrounding  regions  and  whose  surrounding  regions  are 
identically  valued  constant  neighborhoods. 

A  root  is  a  signal  which  is  not  modified  by  filtering. 

Gallagher  and  Wise*  have  shown  that,  while  impulses  are  eliminated  by 
median  filtering,  constant  neighborhoods  and  edges  are  unperturbed,  and  in 
fact,  only  signals  composed  solely  of  constant  neighborhoods  and  edges  are 
roots  to  the  median  filter.  Again  referring  to  Fig.  2,  note  that  the  sig¬ 
nal  is  a  root  of  the  N=1  median  filter  but  not  for  filters  with  N  greater 
than  one.  However,  after  one  pass  of  the  N=2  filter  or  two  passes  of  the 
N=3  filter  the  resulting  outputs  are  roots  of  their  respective  filters.  In 
fact,  Gallagher  and  Wise  have  also  proven  that  any  signal  of  length  L  is 

reduced  to  its  root  after  at  most  j*(l-2)  successive  passes  by  any  median 

filter.  Furthermore,  any  root  of  a  median  filter  with  a  particular  window 
size  is  also  a  root  of  any  median  filter  with  a  smaller  window  size. 


i 


II.  Nth  RANKED-ORDER  OPERATIONS 

If  instead  of  the  median  valued  point  the  value  of  the  nth  largest  point 
in  the  filter  window  is  passed  to  the  output  at  each  step,  then  a  general 
set  of  operations,  called  nth  ranked-order  operations,  is  found.  More  for¬ 
mally,  the  output  of  the  nth  ranked-order  operation  at  position  A  is 

Y(A)  =  the  nth  largest  value  of  {x  ( A-N)  ,...., x  ( A-1  ),x  (A)  ,x(A+1>,. ..,x(A+N)> 

This  set  of  operations  includes  the  median  filter  case,  n*N+1,  and  many  of 
the  properties  for  all  values  of  n  are  similar  to  the  properties  of  the 
median  filter.  The  non-median  nth  ranked-order  operations  have  potential 
applications  in  areas  such  as  peak  detection  with  impulse  rejection  and  di¬ 
gital  A.M.  detection  (see  Fig.  3). 

The  nth  ranked-order  operation  can  also  be  defined  by  the  decision  rule 
used  to  select  the  output  value  at  each  step.  For  2N+1  points  inside  the 
window,  the  nth  ranked  point,  x(a),  is  the  point  such  that  there  are  at 
least  n  points  with  values  less  than  or  equal  to  x(a)  and  at  least 
2N+1-(n-1 )=2N+2-n  points  with  values  greater  than  or  equal  to  x(a).  A  num¬ 
ber  of  properties  of  the  nth  ranked-order  operation  can  now  be  developed. 

Property  1 :  A  point,  X Ct ) ,  is  unchanged  (y(t)  *  x(t>)  by  an  nth  ranked- 
order  operation  if  two  conditions  are  met.  The  point,  x(t),  is  located  in 
a  constant  region,  and  x(t)'s  position  is  restricted  to  b+N- 
a  <  t  <  c-C|N+1-n|+a3  where  a  is  any  nonnegative  integer  of  value  less  than 
N+1-|N+1 -n|  and  b  and  c  are  the  positions  of  the  two  endpoints  of  the  con¬ 
stant  region 

Proof; 

Assume  that  the  two  conditions  given  above  are  met.  Now,  let  a  = 
0.  The  constant  region  must  now  extend  to  at  least  N  points  left  (de¬ 
creasing  t)  of  x(t)  and  |N+1-n|  points  right  of  x(t)  for  a  total  of  at 
least  1+N+|N+1+n|  points  of  value  x(t>  inside  the  window.  Further¬ 
more,  if  a  *  0,  then  the  constant  region  will  extend  'a*  fewer  points 
to  the  left  of  x(t)  but  'a'  more  points  to  the  right,  thus,  maintain¬ 
ing  a  total  of  at  least  N+1 + I N+1 +n |  constant  valued  points  inside  he 
window.  This  means  that  if  N+1  >  n  then  at  least 
1+N+|N+1-n|  =  2N+2-n  On)  points  inside  the  window  have  values  equal 


(a)  (b) 


Figure  3:  A.M.  Detection  of  a  5KHz  tone  on  a  31  KHz  carrier  and  sampled 
at  250KHz  using  an  8th  ranked-order  operation  with  a  window 
size  of  9 

(a)  original  signal  (b)  signal  corrupted  wi th  impulse  noise 


to  x(t).  Thus,  x(t)  meets  the  decision  rule,  and  y(t)  =  xCt).  Like¬ 
wise,  if  N+1  <  n,  then  1+N+|N+1-n|  =  n(£  2N+2-n>,  and  again  y(t)  = 
x(t) . 

Property  2:  A  rising  impulse  like  signal  of  width  less  than  2N+2-n  points 
or  a  falTing  impulse  like  signal  of  width  less  than  n  points  will  be  elim¬ 
inated. 

Proof: 

i)  If  a  rising  impulse  has  fewer  than  2N+2-n  points,  then  no  point 
of  the  impulse  can  ever  meet  the  second  decision  criterion.  Thus,  no 
output  points  will  have  values  equal  to  the  value  of  the  impulse. 

ii)  Likewise,  if  a  falling  impulse  has  fewer  than  n  points,  then 
no  point  of  the  impulse  can  ever  meet  the  first  decision  criterion, 
and  no  output  points  will  have  values  equal  to  the  value  of  the  im¬ 
pulse. 

The  definitions  previously  given  for  the  median  case  may  now  be  general¬ 
ized  for  all  the  nth  ranked-order  cases. 

A  constant  neighborhood  is  a  region  of  at  least  N+1+|N+1-n|  consecu- 
tive  points  all  of  which  are  identically  valued. 

An  impulse  is  a  set  of  points  whose  values  are  different  from  the  sur¬ 
rounding  regions  and  whose  surrounding  regions  are  identically 
valued  constant  neighborhoods.  If  the  values  of  this  set  of 
points  are  greater  than  the  surrounding  neighborhoods,  then  the 
impulse  contains  less  than  2N+2-n  points,  and  if  the  values  of 
the  impulse  are  less  than  the  surrounding  regions,  then  the  im¬ 
pulse  contains  less  than  n  points. 

The  definitions  for  the  edge  and  the  root  are  unchanged.  Note  that, 
property  2  can  be  restated  as  "impulses  are  eliminated  by  nth  order  opera¬ 
tions".  Using  these  definitions,  further  properties  can  be  developed.  Due 
to  lack  of  space,  however,  many  of  these  properties  are  presented  without 
proof. 

Property  3:  Upon  each  pass  of  an  nth  ranked-order  operation,  every  edge  of 
a  signal  will  be  moved  to  the  left  (advanced)  by 

sgnCedge]*(n-N-1)  points 

!+1  if  x(t)  <  x(t+1)  For  t  ranging  over  all 

-1  if  x(t)  >  x(t+1)  positions  in  the  edge 

Property 

Any  constant  region  of  2N+2-n  or  more  points  surrounded  by  constant 
neighborhoods  of  lesser  values  will  be  changed  in  width  by  2*(n-N-1)  points 
after  being  passed  through  an  nth  ranked-order  operator. 

Any  constant  region  of  n  or  more  points  surrounded  by  constant  neighbor¬ 
hoods  of  greater  values  will  after  being  operated  on  be  changed  in  width  by 
2*(N+1-n)  points. 

As  can  be  seen  from  the  above  properties,  for  n  greater  than  N+1  the 
maximum  valued  signal  segment  (or  the  minimum  if  n  is  less  than  N+1)  which 
is  not  an  impulse  tends  to  expand  its  coverage  with  each  pass  of  a  non- 
median  operator.  Thus,  under  repeated  operations,  a  signal  tends  to  be  re¬ 
duced  to  a  constant.  That  this  is  true  for  any  signal  is  shown  in  the  fol¬ 
lowing  properties. 


Property  5^:  Only  constant  signals  are  invariant  to  nth  ranked-order  opera¬ 
tions  if  n  is  not  equal  to  N+1 . 


Property  6:  If  n  is  not  equal  to  N+1 ,  then  repeated  passes  of  an  nth 
ranked-order  process  will  reduce  any  finite  length  signal  to  a  constant. 


The  output  of  an  nth  ranked-order  operation  at  position  Z  is  not  influ¬ 
enced  by  input  points  more  than  N  points  ahead  OZ+N)  or  N  points  behind 
(<Z-N)  Z.  This  suggests  a  method  by  which  long  signals  could  be  segmented 
and  the  ranked-order  operations  on  each  segment  carried  out  in  parallel, 

i)  Append  the  start  and  stop  points  as  usual 

ii)  Divide  the  signal  into  overlapping  segments.  Each  overlap  is 
2N+1  elements  wide. 

iii)  Perform  the  normal  nth  ranked-order  operation  independently  on 
each  segement. 

iv)  After  each  operation  replace  the  last  N  points  of  each  segment 
(except  the  last  segment)  with  the  N+2  through  the  2N+1  points  of 
the  following  segment.  Also,  replace  the  first  N  element  of  each 
signal  segment  (except  the  first  segment)  with  the  elements  from 
the  2N+1  through  N+2  positions  preceding  the  end  of  the  prior 
signal  segment. 

Now,  the  signal  is  the  same  as  it  would  be  had  the  processing  been  done  be¬ 
fore  the  segmentation.  Thus,  further  processing  can  now  be  done,  or  the 
segments  can  be  recombined  to  form  the  final  output  signal. 

A  signal  may  be  formed  from  independent  identically  distributed,  iid, 
sample  points  of  a  random  process.  Such  a  signal  would  be  formed  if  white 
noise  were  sampled  to  form  the  input  signal.  For  this  type  of  signal, 

results  from  order  statistics6  may  be  used  to  obtain  the  first  order  dis¬ 
tribution,  Fy(*) ,  and  the  density,  fy(*),  of  the  output  of  an  nth  ranked- 

order  operation.  If  the  distribution,  FxC),  and  the  density,  fxO,  of 

the  input  are  known,  then  fy(*)  and  Fy(*)  are  given  by 


f  (X)  = _ «KUL 

y  (n-1 ) !1 ! (2N+1 


-n)!  [F 


2N+1 

F  <*>  =  T. 

7  K=n 


(2N+1 ) 


K!  (2N+1-K) ! 


n"1(x)(1-F  (x))(2N+1_n) 

X  X 

■  F*(x>  (1-F  (x))2N+1_K 

X  X 


fx(x)J  property  #7 
property  U 8 


where  2N+1  is  the  window  size. 

Kuhlman  and  Wise7  will  present  further  statistical  analysis  of  the  medi¬ 
an  filtering  of  independent  identically  distributed  random  processes  in  the 
next  paper.  However,  the  above  formulas  can  immediately  be  used  to  prove 
that  the  statistical  median  of  an  iid  process  is  preserved  under  standard 
median  filtering. 

Property  9;  A  median  filter,  x(»)  ♦  yC);  with  an  input  of  iid  sample 
points  wTll  transform  the  distribution  of  the  input,  Fx<*>  ♦  F  (•),  sym¬ 
metrically  about  0.5.  That  is,  for  any  t  such  that  F  (i)  ♦  F  (l),  then 

*  y 

(1-F  (i))  *  (1-F  (»)). 
x  y 

Property  10:  The  statistical  median  of  a  signal  of  iid  sample  points  is 
preserved  upon  median  filtering,  or  given  l  such  that  F  (t)  =  0.5,  then 

Fy(t)  =  0.5. 

Also  recall  that  if  the  density  of  the  input,  f^l*),  is  symmetric,  then  the 


mean,  Ex<  >,  and  the  median  are  equal.  Therefore,  by  properties  9  and  10, 

the  mean  of  an  iid  sample  point  signal  whose  density  is  symmetric  is  also 
preserved  under  median  filtering.  However,  in  general,  the  actual  median 
point  and  the  average  of  a  particular  signal  will  not  be  preserved. 


Recursive  Operations 

Now  consider  replacing,  at  every  step,  the  leftmost  N  points  in  the  mov¬ 
ing  window  with  the  previous  N  output  points,  and  apply  the  same  decision 
rule  as  was  previously  given  for  the  nth  ranked-order  operation  to  obtain 
the  next  output  value.  This  produces  a  recursive  nth  ranked-order  opera¬ 
tion  which  can  be  more  formally  stated  as  follows. 

Y(A)  =  the  nth  largest  value  of  <Y(A-N> ,. . . ,Y(A-1 ) ,X(A> ,X(A+1 ) ,. . . ,X(A+N)> 

Where  X(A)  and  Y(A)  are  the  values  of  the  input  and  the  output  respectively 
at  position  A.  The  properties  of  these  operations  are  similar  to  those  of 
standard  nth  ranked-order  operations.  Most  notably,  they  have  the  same  set 
of  roots. 


Property  11 :  A  signal  is  invariant  to  recursive  filtering  if  and  only  if  it 
is  invariant  to  standard  filtering. 


Proof: 

If  a  signal  is  invariant  to  an  operation,  XC*)  *  Y(*),  then  XCk)  = 
Y(x)  for  all  k.  Therefore,  if  a  signal  is  invariant,  then  standard 
and  recursive  operations  use  the  same  points  in  the  decision  rule,  and 
they  must  produce  the  same  resulting  signal. 


However,  the  same  signal  will  not  in  general  reduce  to  the  same  root  under 
recursive  and  standard  operations.  This  is  illustrated  by  an  example  for 
the  median  (n=N+1)  filter  case  in  Fig.  4.  One  may  notice  that  under  noisy 
conditions,  the  recursive  filter  tends  to  maintain  a  higher  correlation 
between  points  in  its  output  than  does  its  non-recursive  counterpart.  This 
is  further  illustrated  in  figure  5  which  compares  the  autocorrelation  of 
the  output  for  recursive  and  standard  median  filters  with  independant  uni¬ 
formly  C0,1]  distributed  input  points.  These  autocorrelation  functions 
were  obtained  experimentally  from  a  sequence  of  2,200  random  points.  Thus, 
these  filters  may  be  useful  in  cases  where  more  stringent  filtering  without 
a  wider  window  is  required. 

One  of  the  most  interesting  characteristics  of  the  recursive  operations 

•  •  •  •  • 


■  ■  •  •  •  •  •  • 

I _ I _ I _ i  «  i  i _ I _ i  i  i  i  i  i  i  i 


Input  signal 


I — I _ l 


•  •  • 

I  i  l  I 


•  •  •  • 


•  •  •  a  a 

J _ L  J _ I _ I _ till 


•  •  •  • 

•  •••  •••• 

I _ I — I _ I _ I _ I _ I _ I _ l_l _ I _ I _ i  t  i  l 


Output  1st  pass 
Standard  Median  Filter 


Output  2nd  pass  (root) 
Standard  Median  Filter 


•••••••••••••a  Output  1st  pass  (root) 

lit  1— J— I — 1— L—J — I — I  i  I  I  1  I  Recursive  Median  Filter 


Figure 


4:  Recursive  vs  Standard  Median  filters  with  a  window  width  of  5 
(N-2) 


✓ 


! 

1 


I 


Figure  5:  Autocorrelation  function  of  standard  and  recursive 
median  filters  for  a  window  width  of  five. 


is  that  the  root  of  a  signal  for  a  particular  recursive  process  can  always 
be  found  after  the  first  pass  of  the  operation.  Recursive  ranked-order 
operations  are  therefore  potentially  useful  in  areas,  such  as  peak  detec¬ 
tion  and  coding  operations,  which  require  finding  the  root  of  a  signal 
quickly.  The  following  two  properties  prove  this  characteristic. 

Property  12;  Any  signal  will  be  reduced  to  a  root  after  one  pass  of  a  re¬ 
cursive  median  filter  (n=N+1). 

Property  13;  If  n#N+1,  then  the  last  computed  output  value  of  a  signal  be¬ 
ing  operated  on  by  a  recursive  nth  ranked-order  operation  is  the  value  of 
the  signal  root  for  that  operator.  For  n  >  N+1  (n  <  N+1)  this  value  is  the 
value  of  the  maximum  (minimum)  value  to  survive  the  first  filter  pass. 


Other  Functions 

I  In  addition  to  the  above  mentioned  operations,  many  more  variations  of 

the  median  filter  exist.  Many  of  these  other  variations  also  have  proper¬ 
ties  which  may  be  useful  in  signal  processing.  We  have  studied  several 
such  modifications  and  present  some  of  them  here.  Many  of  these  modifica¬ 
tions  were  obtained  by  defining  a  set  of  signal  roots  with  certain  desir¬ 
able  characteristics;  then,  we  developed  an  operation  which  would  have  a: 
^  many  members  of  this  set  as  possible  for  its  own  roots.  Unfortunately,  we 

have  not,  as  yet,  found  a  systematic  method  of  determining  an  operation 
which  will  have  any  particular  set  of  roots.  Nevertheless,  this  approach 
does  appear  to  hold  promise. 


« 


AD-A121  294 
UNCLASSIFIED 


THE  ANALVSIS  OF  DESIGN  OF  ROBUST  NONLINEAR  ESTIHATORS 
AND  ROBUST  SIGNAL  C.  .  <U>  PURDUE  UNIV  LAFAVETTE  IN 
SCHOOL  OF  ELECTRICAL  ENGINEERING  N  C  GALLAGHER 
16  SEP  82  AFOSR-TR-82-0922  RFOSR-7B-2605  F/G  12/1 


MICROCOPY  RESOLUTION  TEST  CHART 
national  auftCAu  or  stanoanos^ims-a 


<J 

ar 

Med i an 
Filter 


Input 


■  •  •  • 


Output  (window  width  *  3) 


Figure  6:  Example  of  a  linear-median  filter  using  a  differentiator, 
integrator  pair  and  a  median  filter  with  a  window  width  of 
3  (N-l) 

One  modification  to  the  median  filter,  which  Tukey  C13  and  Rabiner  C2D 
have  already  utilized  with  promising  results,  is  that  of  combining  linear 
and  median  operations  together.  This  allows  one  to  greatly  extend  the  num¬ 
ber  of  available  effects  by  utilizing  some  of  the  many  linear  operators 
whose  properties  are  already  well  known.  As  an  example  of  such  an  opera¬ 
tion,  consider  figure  6.  Here,  a  signal  is  differentiated,  median  fil¬ 
tered,  and  finally  integrated.  This  operation  has  many  of  the  same  proper¬ 
ties  as  a  median  filter  alone.  However,  due  to  the  differentiation,  any 
slope  of  extent  less  than  N+1  points  will  be  seen  by  the  median  filter  as 
an  impulse  and,  thus,  eliminated.  Therefore,  roots  of  this  operation  can¬ 
not  contain  sharp  edges. 

Another  method  of  varying  the  median  filter  is  to  weight  some  positions 
of  the  window  more  heavily  than  others.  This  could  be  done  by  duplicating 
certain  positions  of  the  window.  If  the  center  position,  for  example,  were 
to  be  weighted  by  three,  then  the  output  at  position  A  would  be  given  by 

Y(A)  a  the  median  value  of  <X(A-N) ,. . . ,X(A) ,X(A) ,X(A) ,. . . ,X(A+N)> 

Yet,  another  modification  would  be  to  allow  the  value  of  a  given  position 
of  the  window  to  be  a  linear  function  of  the  points  (possibly  all  of  them) 
inside  the  window.  Thus,  the  output  at  position  A  would  be 

Y (A)  =  the  median  value  of  <f1<X(A-N),...,X(A+N)>,...,f|>(:x(A-N),...,X(A+N)» 

where  m  is  the  number  of  values  used  in  the  decision  process.  A  simple  ex¬ 
ample  combining  the  previous  two  modifications  is  given  in  figure  7.  *  In 
this  example,  the  points  inside  the  window  are  first  scaled  by  either  -1, 
0,  or  +1;  then,  the  center  position  is  weighted  by  three,  and  the  median 
operation  is  carried  out.  The  roots  of  this  operation  are  zero  or  those 
segments  of  periodicity  4  CX ( i )  =  X(i  ±  4))  which  are  symmetric  about 
zero.  Thus,  with  some  modifications,  rnedi.n  type  filters  can  be  designed 
for  a  wide  range  of  different  roots,  including  some  periodic  type  signals. 


Conclusion 

In  this  paper,  we  have  examined  several  variants  of  the  median  filter. 
We  have  found  that  the  set  of  nth  ranked-order  operations  is  a  generalize- 


n  i  1  1  i  ■  i 


•  • 


Input 


I  I  I  I  I  I  I 


•  • 

LJL  |  4—1  1  J — 11 


1111 

•  •  •  • 


i-i-  1  I  -l  A  I  I 


Output  (root) 


Y (A)  -  the  median  value  of  {-1  •  x(A-2) ,0  •  x(A-l ) ,x(A) ,x(A) ,  x(A) , 
0  •  x(A  ♦  1),  -1  •  x(A  +  2) } 


Figure  7:  Median  filtering  with  a  modified  window  function 

tion  of  the  median  filter',  and  that  they  all  have  many  similar  characteris¬ 
tics.  However,  the  non-median  operators  will,  after  repeated  passes, 
reduce  any  signal  to  a  constant.  In  contrast,  the  recursive  median  process 
retains  the  same  set  of  roots  as  a  standard  median  filter,  though  the  same 
signal  may  not  reduce  to  the  same  root  under  both  operations.  However,  the 
recursive  median  filter  reduces  any  signal  to  a  root  in  just  one  pass,  and 
thus,  may  be  useful  where  high  speed  root  determination  is  required.  We 
have  also  reviewed  some  examples  of  other  types  of  modified  median  opera¬ 
tions,  including  combined  linear,  median  functions  and  filters  with  modi¬ 
fied  windows. 


REFERENCES 

1.  J.  W.  Tukey,  "Nonlinear  (Nonsuperposable)  Methods  for  Smoothing  Data,” 
in  Cong.  Rec.,  1974  EASCON,  p.  673. 

2.  L.  R.  Rabiner,  M.  R.  Sambur,  and  C.  E.  Schmidt,  "Applications  of  a  Non¬ 
linear  Smoothing  Algorithm  to  Speech  Processing,"  IEEE  Trans.  Acust., 
Speech,  and  Signal  Processing,  vol.  ASSP-.4,  pp.  552-557,  Dec.  1975. 

3.  N.  S.  Jayant,  "Average  and  Median  Based  Smoothing  Techniques  for  Im¬ 
proving  Digital  Speech  Quality  in  the  Presence  of  Transmission  Errors," 
IEEE  Trans,  on  Commun.,  vol.  COM-24,  pp.  1043-1045,  Sept.  1976. 

4.  T.  S.  Huang,  6.  T.  Yang,  and  G.  Y.  Tange,  "A  Fast  Two-Dimensional  Medi¬ 
an  Filtering  Algorithm,"  IEEE  Trans.  Acoust.,  Speech,  and  Signal  Pro¬ 
cessing,  vol.  ASSP-27,  pp.  13-18,  Feb.  1979. 

5.  N.  C.  Gallagher,  Jr.  and  G.  L.  Wise,  "Passband  and  Stepband  Properties 
of  Median  Filters,"  Proceedings  of  the  Princeton  Conference  on  Informa¬ 
tion  Sciences  and  Systems,  March  1980. 

6.  H.  A.  David,  Order  Statistics,  (1970),  Wiley,  New  York. 

7.  F.  Kuhlman  and  G.L.  Wise,  "On  Spectral  Characteristic  of  Median  Fil¬ 
tered  Independent  Data",  Allerton  Conference  on  Communication,  Control, 
and  Computing,  October  1980. 


THE  DESIGN  OF  MULTIDIMENSIONAL  QUANTIZERS  USING  PREQUANTIZATION 


Kerry  D.  Rines  and  Neal  C.  Gallagher,  Jr. 
School  of  Electrical  Engineering 
Purdue  University 
West  Lafayette,  Indiana  47907 


ABSTRACT 

A  novel  approach  to  the  design  of  multidimensional  quantizers  is 
presented.  This  technique  is  used  to  design  optimum  uniform  multidimen¬ 
sional  quantizers  that  can  be  operated  in  real  time.  The  quantizers  are 
easily  implemented  using  zero  memory  nonlinearities,  linear  transforma¬ 
tions  and  univariate  uniform  step  size  quantizers. 

I.  INTRODUCTION 

There  is  considerable  interest  in  the  use  of  multidimensional  quantiz¬ 
ers  for  the  encoding  of  analog  sources.  Much  of  this  interest  has  been 
generated  from  a  theoretical  standpoint.  The  multivariate  quantization 
results  of  Zador  Cl]  point  to  the  advantages  of  multidimensional  quantiz¬ 
ers  over  univariate  quantizers  at  high  bit  rates.  Simply  stated,  the 
results  indicate  that  the  optimum  per  sample  distortion  decreases  as  the 
dimension  of  the  quantizer  increases.  Therefore  the  potential  exists  to 
improve  the  performance  of  digital  encoders  by  replacing  univariate  quan¬ 
tizers  with  multidimensional  quantizers. 

Recently  the  design  of  optimum  multidimensional  quantizers  has  been 
addressed.  Computer  algorithms  for  designing  optimum  quantizers  of  two 
or  more  dimensions  have  been  presented  by  many  authors,  such  as  Linde  et 
al  C2J.  The  optimum  quantizers  are  implemented  using  a  search  procedure 
to  choose,  from  a  specified  output  set,  the  output  that  is  the  smallest 
distance  from  the  input.  This  implementation  of  the  optimum  quantizer 
may  be  difficult  or  impossible  to  operate  in  real  time  at  high  bit  rates. 
In  contrast  the  univariate  uniform  step  size  quantizer  is  a  zero  memory 
device  that  can  be  operated  in  real  time.  To  date  the  easy  implementa¬ 
tion  and  real  time  operation  of  the  univariate  uniform  step  size  quantiz¬ 
er  has  outweighed  the  theoretical  advantages  of  using  multidimensional 
quantizers  in  the  design  of  digital  encoders. 

In  this  paper  we  present  a  novel  approach  to  the  design  of  multidimen¬ 
sional  quantizers  called  prequantization.  The  design  is  illustrated  in 
Figure  1  where  a  zero  memory  nonlinearity  called  a  prequantizer  precedes 
a  specified  multidimensional  quantizer. 


Figure  1.  Multidimensional  Quantizer  Design  using  Prequantization. 

Patented  at  the.  Eighteenth  Annual  MlenXon  Con  defence  on  Cormw'cxitioni, 
Control  and  Computing ,  Octvben  S-10,1980. 


This  design  is  similar  in  some  respects  to  the  companding  design  of 
nonuniform  univariate  quantizers  first  proposed  by  Bennett  C33.  In  the 
univariate  case  a  nonuni  form  quantizer  may  be  difficult  to  implement 
directly.  However/  with  companding  we  can  design  a  nonuniform  quantiz¬ 
er  using  a  uniform  step  size  quantizer/  an  .avertible  nonlinearity  and 
the  inverse  nonlinearity.  Similarly/  prequantization  can  be  used  to 
design  many  multidimensional  quantizers.  Prequantization  enables  us  to 
design  these  quantizers  using  a  simple  multidimensional  quantizer/  which 
is  easy  to  implement  and  operate  in  real  time ,  along  with  a  zero  memory 
nonlinearity.  We  illustrate  the  usefulness  of  prequantization  with  three 
examples. 

In  a  recent  paper  Gersho  C4D  considers  the  partitioning  of  optimum  un¬ 
iform  multidimensional  quantizers.  He  states  that  the  optimum  uniform 
two-dimensional  quantizer  is  the  hexagonal  quantizer.  In  three  dimen¬ 
sions/  Gersho  argues  that  the  truncated  octahedral  quantizer  is  very 
likely  to  be  the  optimum  uniform  three-dimensional  quantizer.  The  analog 
of  the  truncated  octahedron  is  considered  for  four  dimensions.  The 
resulting  quantizer  is  not  known  to  be  optimal  for  four  dimensions/  but 
does  have  a  lower  per  sample  distortion  than  the  three  dimensional  trun¬ 
cated  octahedral  quantizer.  In  this  paper  we  present  the  designs  for 
these  three  quantizers  using  prequantization.  In  each  case  the  design  is 
easy  to  Implement  and  the  quantizer  can  operate  in  real  time.  The  real 
time  operation  of  these  quantizers  for  high  bit  rates  is  a  significant 
result  and  demonstrates  the  important  practical  applications  for  pre¬ 
quantization.  We  begin  in  section  21  with  a  discussion  of  the  prequanti¬ 
zation  design  procedure. 


II.  PREQUANTIZATION 

The  design  of  k-dimensional  quantizers  using  prequantization  is  illus¬ 
trated  in  Figure  1.  The  design  consists  of  a  nonlinearity  called  a  pre¬ 
quantizer  preceding  a  specified  k-dimensional  quantizer.  The  implementa¬ 
tion  of  this  design  approach  takes  place  in  two  steps.  First  a  k- 
dimensional  quantizer  meeting  a  specified  criterion  is  chosen.  In  this 
paper  we  are  interested  in  real  time  operation/  therefore  we  specify  that 
the  quantizer  be  able  to  operate  in  real  time.  Examining  Figure  1/  we 
require  that  the  real  time  (specified)  quantizer  have  the  same  set  of 
output  values  as  the  quantizer  we  wish  to  design.  This  is  the  only  con¬ 
straint  placed  on  the  choice  of  the  real  time  quantizer.  Free  to  choose 
from  all  quantizers  satisfying  the  output  constraint/  we  choose  a  real 
time  quantizer  that  is  easy  to  implement.  The  ability  to  exercise  some 
control  over  the  choice  of  the  k-dimensional  quantizer  is  one  of  the  ad¬ 
vantages  of  this  design  procedure. 

The  second  step  in  the  implementation  is  the  design  of  the  prequantiz¬ 
er.  The  role  of  the  prequantizer  is  to  complete  the  mapping  of  the  input 
variables  into  the  desired  output  values.  The  real  time  k-dimensional 
quantizer  can  be  characterized  by  the  mapping  of  its  input  space  into  its 
output  values.  This  mapping  is  usually  described  by  a  partitioning  of 
the  input  space,  where  all  the  input  vectors  contained  within  one  parti¬ 
tion  are  mapped  into  the  same  output  vector.  Since  the  real  time  quan¬ 
tizer  is  chosen  based  only  on  its  output  values,  we  do  not  expect  its 
partitioning  to  be  the  same  as  the  partitioning  of  the  quantizer  being 
designed.  It  is  the  prequantizer  which  is  used  to  obtain  the  partition¬ 
ing  specified  by  the  desired  quantizer  design.  The  prequantizing  func¬ 
tion  maps  a  partition  specified  by  the  quantizer  being  designed  into  a 
partition  of  the  real  time  quantizer  that  corresponds  to  the  specified 
output.  Once  the  prequant izing  function  is  determined  the  k-dimensional 

-  2  - 


quantizer  design  Is  complete.  We  Illustrate  the  design  procedure  with  a 
simple  example. 

Consider  the  design  of  a  univariate  quantizer  with  Input  x  and  output 
8  as  described  In  (1). 


x'=na  ;  na-^£x<na  +  (1) 

Using  the  prequantization  procedure/  we  first  choose  a  quantizer  that  is 
easy  to  implement  and  has  the  same  output  set  as  given  in  Cl).  We  choose 
the  uniform  step  size  quantizer  given  by 

f  !  n  4  ;  n  4  -  |  <  y  <  n  4  +  |.  (2) 

We  now  determine  the  prequantizing  function  that  must  precede  the  quan¬ 
tizer  in  (2)  to  complete  the  design.  Observe  that  quantizing  y  *  x  - 
in  (2)  is  identical  to  quantizing  x  in  (1).  Thus  the  prequantizing  func¬ 
tion  is  simply  f(x)  *  x  -  ^  and  the  design  of  the  quantizer  in  (1)  is 
complete. 

III.  HEXAGONAL  QUANTIZATION 

Gersho  has  argued  that  the  optimum  uniform  two-dimensional  quantizer 
is  the  hexagonal  quantizer.  The  design  of  a  hexagonal  quantizer  using 
prequantizing  is  given  here.  First  we  attempt  to  find  a  two-dimensional 
quantizer  that  can  be  easily  implemented  and  has  the  same  set  of  output 
values  as  the  hexagonal  quantizer.  One  quantizer  meeting  these  require¬ 
ments  is  a  scaled  version  of  the  diamond  quantizer  given  below. 

Let  the  inputs  to  the  two-dimensional  quantizer  be  x  and  y.  The  vari¬ 
ables  x  and  y  are  first  encode'd  into  two  new  variables  w  and  z  by  the 
linear  transformation. 


w  *  x  ♦  y/J  y 
z  *  x  -  VT  y. 


(3) 


The  variables  w  and  z  are  quantized  separately  by  univariate  quantizers 
with  a  uniform  step  size  A.  The  outputs  of  the  two-dimensional  quantizer 
are  then  obtained  using  the  linear  transformation. 


8  *  -jCQ  ♦  2) 

9  •  — — <S  “  2). 

2v/3 


(4) 


The  position  of  this  quantizer  in  the  hexagonal  quantizer  design  is  shown 
in  Figure  2  and  the  partitioning  of  the  scaled  diamond  quantizer  is  given 
in  Figure  3.  Having  chosen  the  two-dimensional  quantizer  given  in  (3) 
and  (4)  we  now  turn  to  the  design  of  the  prequantizer. 

The  prequantizer  must  map  the  hexagonal  region  corresponding  to  each 
output  into  the  scaled  diamond  shaped  region  corresponding  to  that  same 
output.  Consider  the  hexagonal  partitioning  shown  in  Figure  4. 


-  3  - 


Figure  2.  Prequantization  design  for  the  hexagonal  quantizer. 
The  quantizer  Q  has  uniform  step-size  a. 


Figure  3.  Partitioning  of  the  scaled  diamond  quantizer. 

Assume  x  is  fixed  and  the  pair  <x,y)  is  contained  within  a  given  hexago¬ 
nal  partition.  We  now  pose  the  question,  does  there  exist  a  value  x' 
such  that  the  pair  lx',/)  is  contained  within  the  corresponding  diamond 
partition  for  all  values  of  y?  This  approach  is  illustrated  with  the 
following  example.  Let  x  =  x^  as  shown  in  Figure  3  and  let  y  be  in  the 

range  -  — - —  to  — - — .  In  Figure  4  we  observe  that  the  hexagonal  quan- 
2y/7  2\/T 

tizer  output  will  be  (0,0)  for  all  input  pairs  in  the  set 
<(x^,y)  :  y^  <  y  <  y^).  Similarly  in  Figure  3  we  observe  that  the  scaled 

diamond  quantizer  output  will  be  (0,0)  for  all  input  pairs  in  the  set 
{(Xj,  y)  :  y^  <.  y  <  yj>.  Therefore  if  X£  =  f(x^),  the  quantizer  in  Fig¬ 
ure  2  will  behave  like  the  hexagonal  quantizer  for  all  input  pairs  in  the 

set  <(x.,y)  :  —  —  <  y  <  — — >.  In  fact,  we  can  show  that  the  quantizer 
1  2y/J  2y/7 


-  4  - 


Figure  4.  Partitioning  of  the  hexagonal  quantizer. 

in  Figure  2  behaves  like  the  hexagonal  quantizer  for  all  inputs  in  the 
set  <lx.,,y)  :  -  •  <  y  <  •>  when  X2  B  fix.,).  Repeating  this  example  for 

all  possible  values  of  x,,,  we  obtain  a  prequantizing  function  that  maps 

the  hexagonal  region  corresponding  to  each  output  into  the  scaled  diamond 
shaped  region  corresponding  to  that  same  output.  The  prequantizing  func¬ 
tion  is  given  in  (5). 

fix)  «  n  £  +  i  15) 

*  3x  -  I2n+1 )  £  ;  n  £  ♦  £  <  x  <  ln+1)  £  - 
IV.  RESULTS  IN  HIGHER  DIMENSIONS 

In  this  section  we  present  the  design  of  the  optimum  lor  near  optimum) 
uniform  quantizers  for  three  and  four  dimensions.  Each  of  these  quantiz¬ 
ers  use  in  their  designs  a  two-dimensional  quantizer  termed  the  diamond 
quantizer.  The  algorithm  for  the  diamond  quantizer  is  as  follows.  Let 
the  inputs  to  the  two-dimensional  quantizer  be  x  and  y.  The  variables  x 
and  y  are  first  encoded  into  two  new  variables  w  and  z  by  the  linear 
transformation. 


w  ■  x  ♦  y 

z  -  x  -  y.  (6> 

The  variables  w  and  z  are  quantized  separately  by  univariate  quantizers 
with  a  uniform  step  size  A.  The  outputs  of  the  diamond  quantizer  are 
then  obtained  from  a  linear  transformation  of  the  quantized  variables  w 
and  2  given  by 


-  5  - 


(7) 


2  *  -J<3  +  2) 

9  *  7<a  -  2). 

The  outputs  2  and  9  will  be  multiples  of  j  for  all  possible  inputs.  A 
useful  property  of  the  diamond  quantizer  is  that  if  either  input  x  or  y 
is  a  multiple  of  •£>  its  quantized  value  2  or  9  will  be  that  same  multiple 

of  £ .  Therefore  if  the  output  of  one  diamond  quantizer  2  is  used  as  the 

input  to  a  second  diamond  quantizer,  the  output  of  the  second  diamond 
quantizer  will  also  be  2.  Using  this  property  we  are  able  to  design 
quantizers  of  higher  dimensions  by  cascading  diamond  quantizers.  The 
results  of  these  designs  are  now  given. 

Gersho  states  that  the  truncated  octahedral  quantizer  is  very  likely 
the  optimum  three  dimensional  quantizer.  This  quantizer  is  defined  by  a 
tessellation  of  a  truncated  octahedron  specified  by  the  set 

<(x1,x2,xj)  :  IxjMxgMxjl  |x^|  <  i*1,2,3  >.  The  design  of 

this  quantizer  is  given  in  Figure  5. 


*1 

x2 

x3 


x2 

*3 


Figure  5.  The  truncated  octahedral  quantizer  design  using 
prequantization.  is  the  diamond  quantizer. 


The  prequantizing  function 


is  given  in  (8)  where  e  =  |xj|  mod(0,-|-). 


For 


f  (x,j  ,Xj)  »n  ^  ;n7’7+e-xlln7  +  7~e  (8) 

=  x^  +  —  e  f  (n-1  )^£x^<n^*y+t. 

A  similar  result  is  obtained  for  <  e  < 

The  four  dimensional  analog  of  the  truncated  octahedral  quantizer  is 
defined  by  the  tessellation  of  the  polytope  specified  by  the  set 

C(x1,x2,x3,x4)  :  |x1l*|x2l*|xj|e|x4|  <  2a  ;  |x.|  <  i=1, 2,3,4}.  For 

convenience  we  will  call  this  quantizer  the  4-d  uniform  quantizer.  The 
design  of  the  4-d  uniform  quantizer  is  shown  in  Figure  6. 


-  6  - 


Figure  6.  The  4-d  uniform  quantizer  using  prequantization. 

Qp  is  the  diamond  quantizer. 

The  prequantizing  function  is  given  in  (9>  where  z  *  |xj|  mod(O^)  , 
w  =  I x4 |  mod<0/j>  and  e  =  z+w.  For  e  <  |> 

f<V*3'x4>  55  n  7  ;  <n_1>  i  +  e  -  X1  -  (n+1>!  ”  e  (9) 

*  X1  "  7  +  e  ;  *n+1)  £  "  e  -  X1  1  tn+1>  7 

=  x1  ♦  -  e  ;  (n-1)  4  <.  £  (n-1)  ■£;  ♦  e. 

A  similar  result  is  obtained  for  j  <  e  <  A. 

A  comparison  of  the  normalized  mean-squared  error  performance  of  the 
uniform  univariate  and  multidimensional  quantizers  is  given  in  Table  1. 
The  results  were  obtained  by  computer  simulation  using  30,000  samples  un- 

1  1 

iformly  distributed  (-  .  The  output  alphabet  of  each  quantizer  was 

assigned  one  hundred  quantization  levels  per  input  sample. 


Dimension 


Quantizer 


nmse  (x10-*) 


uniform  step-size  9.99 
hexagonal  9.66 
truncated  octahedral  9.48 
4-d  uniform  9.17 

V.  DISCUSSION 


In  this  paper  we  have  presented  a  new  approach  to  the  design  of  mul¬ 
tidimensional  quantizers.  The  usefulness  of  the  prequant izat ion  approach 
has  been  demonstrated  by  the  design  of  three  optimum  (or  near  optimum) 
uniform  multidimensional  quantizers.  In  each  example  the  quantizer  can 
be  implemented  using  a  zero  memory  nonlinearity,  linear  transformations, 
and  univariate  uniform  step-size  quantizers.  As  a  result  the  computation 
time  of  each  quantizer  is  independent  of  the  output  alphabet  size. 
Therefore,  these  quantizers  are  both  easy  to  Implement  and  are  able  to 
operate  in  real  time  even  at  very  high  bit  rates. 

The  prequantization  design  approach  is  also  compatible  with  the  design 
of  nonuniform  multidimensional  quantizers.  In  C43  Gersho  generalizes  the 
companding  technique  for  the  design  of  nonuniform  univariate  quantizers 
to  the  design  of  nonuniform  multidimensional  quantizers.  Bucklew  C5D 
shows  that  an  optimum  k-dimensional  quantizer  can  be  designed  using  an 


optimum  uniform  k-dimensional  quantizer,  which  is  preceded  by  a  mul¬ 
tivariate  invertible  nonlinearity  and  followed  by  the  inverse  nonlineari¬ 
ty.  Therefore  the  nonlinear  prequantizing  function  used  in  optimum  uni¬ 
form  k-dimensional  quantizers  is  compatible  and  may  even  be  of  an  advan¬ 
tage  when  the  companding  approach  is  applied  to  multidimensional  quantiz¬ 
ers. 


ACKNOWLEDGMENT 

The  authors  gratefully  acknowledge  the  support  of  the  Air  Force  Office 
of  Scientific  Research  under  grant  AFOSR  78-3605. 

REFERENCES 


Cl]  P.  Zador.  Development  and  Evaluation  of  Procedures  for  Quantizing 
Multivariate  Distributions,  Ph.D.  Dissertation,  Stanford  University, 
1964,  University  Microfilm  No.  64-9855. 

C2]  Y.  Linde,  A.  Buzo,  R.M.  Gray,  "An  Algorithm  for  Vector  Quantizer 
Design,"  IEEE  Trans.  Comm.,  Vol.  COM-28,  pp. 84-95,  January  1980. 

C3]  W.R.  Bennett,  "Spectra  of  Quantized  Signals,"  B.S.T.J.,  Vol.  27,  pp. 
446-472,  July,  1948. 

C4]  A.  Gersho,  "Asymptotically  Optimal  Block  Quantization,"  IEEE  Trans, 
on  Inform.  Theory,  Vol.  IT-25,  pp.  373-380,  July,  1979. 

C5]  J.  A.  Bucklew,  "Companding  and  Random  Quantization  in  Several  Dimen¬ 
sions,"  to  be  published. 


-  8  - 


A  novel  approach  for  the  computation  of  orthonormal 
polynomial  expansions 

Gary  L.  Wise  (1),  Neal  C.  Gallagher  (2) 


ABSTRACT 

In  this  paper  we  present  a  novel  technique  for  the  computation  of  orthonormal  polynomial  ex¬ 
pansions.  The  proposed  method  is  very  straightforward;  given  a  function  to  be  expanded  in  a 
polynomial  scries,  we  first  use  the  FFT  to  compute  a  vector  of  Fourier  coefficients.  Then,  using 
a  change  of  basis  transformation,  we  go  from  the  Fourier  coefficients  to  the  polynomial  coef¬ 
ficients.  Convergence  properties  for  this  new  approach  are  investigated. 


1 .  INTRODUCTION 


where 


Two  common  ways  of  representing  functions  have 
been  polynomial  and  trigonometric  expansions.  In 
much  of  science  and  engineering  the  trigonometric 
Fourier  expansion  has  dominated  over  the  generalized 
Fourier  series  expansions  in  applications.  One  advan¬ 
tage  of  the  trigonometric  series  over  the  polynomial 
series  is  ease  of  coefficient  computation  by  use  of  the 
fast-Fourier-transform  (FFT)  algorithm;  compared  to 
the  FFT,  coefficient  computation  for  polynomial 
expansions  can  be  cumbersome  and  time-consuming. 

In  this  paper  we  derive  a  simple  change-of-bas is  trans¬ 
formation  that  maps  a  trigonometric  series  to  a  poly¬ 
nomial  series. 

These  transformations  have  enabled  us  to  develop  an 
efficient  algorithm  for  the  computation  of  orthonor¬ 
mal  polynomial  expansions.  The  basic  plan  of  these 
algorithms  is  to  create  a  vector  of  Fourier  coefficients 
by  use  of  the  FFT;  this  vector  is  then  multiplied  by  a 
transformation  matrix,  resulting  in  a  vector  of  poly¬ 
nomial  coefficients.  This  approach  can  offer  a  saving 
in  computation  time  over  the  standard  integral  formula 
~>r  computing  these  polynomial  coefficients.  Section 
2  contains  the  derivation  of  the  elements  of  the  trans¬ 
formation  matrix,  and  in  section  3  a  numerical  example 
is  presented. 


2.  POLYNOMIAL  EXPANSIONS 

Assume  that  H(x)  is  an  L2[-T,T]  function  (where  T 
is  finite),  and  therefore  possesses  a  Fourier  series  ex¬ 
pansion  convergent  in  Ljl-T.T].  Thus  we  may  write 

H(x)*  I  h„exp(- iMi) 
n=*-oo  ”  i 


^n  =  2T 

We  also  assume  that 
T 

/T  lHCx)]1 2  *(*)  dxcoo, 

where  w(x)  is  a  nonnegative  weight  function  integrable 

over  [-T,T].  Let  0n(x)  denote  an  nth  order  polynomial, 
00 

and  assume  that  {0n(x))n=;O  is  a  set  of  polynomials 

that  is  orthonormal  and  complete  in  L2[-T,T]  with 
respect  to  the  weight  function  w(x).  Therefore,  we 
can  express  H(x)  as 


HW=n=0 
where 

*n  =  J_T  HW  °nW  w(x)  dx. 

Define  the  truncated  Fourier  series  as 

Hm(x)  =  2  h_  exp  (-  -HE?*-) . 

MV  ’  lm|  <  M  m  T  ' 

Notice  that 


(1) 


J_  lH(x)  -  Hm(x)]  en(x)  w  (x)dx 


-T 


(1 )  G.  L.  Wise,  Department  of  Electrical  Engineering,  University  of  Texas  at  Austin,  Austin, 
Texas  787 12,  USA. 

(2)  N.  C.  Gallagher,  School  of  Electrical  Engineering,  Purdue  University,  West  Lafayette,  Indiana 
47907,  USA. 


journal  of  Computational  and  Applied  Mathematics,  volume  7,  no.  3, 1981. 


157 


The  integral  with  respect  to  y  equals  one  by  defini¬ 
tion.  Since  the  integral  with  respect  to  x  is  finite,  we 
know  that  for  any  e  >  0,  there  exists  a  K  such  that 

j  lH(x)  -  Hm(x)]2  w(x)  dx  <  e, 

E 

where 

E  =  {  x  :  w(x)  >  K.}, 
and  therefore 

T  T  2 

/TtH(x)-HM(x))2w(x)dx<K  JT(H(x)-HM(x)i  dx  +€. 

The  first  term  can  be  made  arbitrarily  small  by  choos 
ingM  sufficiently  large. 

Thus,  we  see  that 


*n=  2  hm  cmn’ 

m  =  -  oo 


k  _r(flti)r(g+i)2"' 
0  r  (a  +  /3  +  2) 

and  for  n  >  1  [2,  p.  169] 


a  +  0  +  1 


ln  (x-l, - (,.l) 

m  =  0  L  m  JLn-m. 


k  -  2a+fi  +  lrin  +  a  +  \)r{n  +  p+l) 
n  (2n  +  a+0+l)r  (n  +  1)  T  (n  +  a+0+1) 

ln  this  case  the  elements  cmn  of  the  transformation 
matrix  C  may  be  calculated  using  a  method  suggested 
by  Yao  and  Thomas  [5]  (there  is  an  error  in  equation 
(32)  in  ]5|).  Utilizing  this  method  we  obtain 

c  „  =  2rr  0„(-mir), 
mn  n' 


,=  /Texp(-i2I?)0n(x)w(x)  dx, 


and  where  the  convergence  is  uniform  in  n.  Con¬ 
sequently,  (2)  may  be  written  as 

a=hC  (3) 

where  h  is  the  row  vector  of  Fourier  series  coefficients, 
a  is  the  row  vector  of  polynomial  coefficients,  and  C 
is  the  matrix  whose  mn-th  element  is  cmn. 

After  uniform  sampling  of  the  function  H(x),  we  can 
compute  the  vector  of  polynomial  coefficients  in  the 
following  manner.  In  practice,  a  finite  number  of 
elements  for  h  are  computed  by  use  of  the  FFT 
algorithms.  Then  we  perform  the  vector  multiplica¬ 
tion  indicated  by  (3).  For  example,  h  will  be  a  2M  +  1 
dimensional  row  vector,  a  will  be  an  L  dimensional 
row  vector,  and  C  will  be  a  (2M  tl)xL  matrix, 
because  all  computations  must  be  performed  using 
only  a  finite  number  of  terms,  we  are  concerned  with 
the  convergence  of  the  resulting  coefficients  an(2M+ 1) 
to  the  correct  coefficients  an  given  by  (1).  We  see 
from  i  lie  above  derivation  that  this  convergence  is 
uuitorm  in  n,  where  we  have  neglected  aliasing  errors 
associated  with  the  FFT  and  machine  computation 
errors.  In  the  remainder  of  this  paper,  it  will  be  as¬ 
sumed  that  all  computations  are  done  with  2M  + 1 
such  sample  points  of  H(x). 

Notice  that  if  we  take  T  =  1  and 

w(x)  -  (1  -  x)“  (1  +  x)f  (4) 

where  tt  >  -l  and  0  >  -1,  the  resulting  0n(x)  arc  the 
normalized  Jacobi  polynomials  given  by 

• . X  • 

where  (4,  p.  284,  #3.191-1] 

1. 


-a-fl-2 

♦n(t)  =  D(n,a,flt  2  Mfl.g  2n  +  a  +  g+l  (2it)’ 

2  ’  2  (5) 

s(t)  is  the  Whittaker  function  [1,  p.  264]  given  by 
Mr>s(t)  =  e"t/2  t2s+1  jFjfi  -  r  +  »i  2s  +  1;  t), 
and  a  +  ft 

Wk^  r(n+1)r(2n  +  a+/3  +  2)(i)  2 


For  a=0,we  obtain  the  normalized  Gegenbauer  poly¬ 
nomials,  and  in  this  case  (5)  becomes 


♦s>W* 


(i/V*  r(n  +  g+l)r(n-t-^3/2)22"-f^3/2Jn+g+1/2(t) 
27t\/kn  T(2n  +  20+2)  r(n  +  l)  t<3+  1/2 


Some  special  classes  of  normalized  Gegenbauer  poly¬ 
nomials  are  the  normalized  Legendre  polynomials, 
both  kinds  of  Chebyshev  polynomials,  and  Tesseral 
polynomials.  Applications  of  the  above  method  for 
Legendre  polynomials  may  be  found  in  [3]. 


3.  AN  EXAMPLE 

ln  this  section  we  present  an  example  of  the  above 
method  using  Chebyshev  polynomials  of  the  first  kind. 
Let  T  =  1  and  let  a  =  0  =  -1/2  in  (4). 

This  results  in  0„(x)  being  the  normalized  nth  Chebyshev 
polynomial  of  the  first  kind.  The  Chebyshev  polynomials 
of  the  first  kind  can  be  defined  by 


Tn  +  lW  =  2sTnW'Tn-lW 

T0(*)=l 

Tj(x)  =  x. 


Journal  of  Computational  and  Applied  Mathematics,  volume  7,  no.  3,  1981 


The  resulting  normalized  polynomials  are  given  by 


flnW  ^7TnW'  n>1- 

The  elements  of  the  transformation  matrix  are  found 
to  be 


cmn^2”  (i)BJB  (-»»>• 

Therefore,  we  have  that 

an  =  ^2*  ^  hm(i)nJn(-m»r). 

m  =  -  °° 

We  consider  the  special  case  where  the  luiictiun  H(x) 
is  real  valued.  Using  the  relations 


and 

J„(-  tnn)=  (-1)"  Jn(mir), 


we  have 

_  OO 

zq  =  V2rt  hp  +  2n/2jt  £  RE  (hm)  jQ(mn) 

m  =1 

an  -  2v'2ff  (-l)n/2  £  RE(hITI)Jn(mJT),  n  even 

m=1  n  ^  0 


n-1 

2 


-1)  2  lM(hm)  Jn(mjr),  n  odd. 

m  -1 


(8) 


We  now  present  an  example  of  the  computation  of 
the  Chebyshev  polynomial  coefficients.  The  func¬ 
tion  H(x)  is 

H(x)  =  0j(x)  +  d2(x) 

=  V|  (2x2  +  x-  1). 

The  Chebyshev  coefficients  are  computed  by  use  of 
(8);  selected  coefficients  an  are  found  in  tables  l  and 
2  for  the  cases  N  =  4096  arid  N  =  8192.  respectively, 
where  N  is  the  number  of  equally  spaced  samples  used 
in  the  FFT. 

TABLE  1.  Selected  values  for  {an}  with  M  -  50,75, 
and  100;  N=  4096. 


M  50 

75 

100 

True 

value 

a0  -5.38*10*3 

-5.93*10-3  -6.57x  10-3 

0 

a,  04186 

0.907 

0.919 

1 

a2  0.995 

0.994 

0.993 

l 

a3  -0.113 

-9.27x10-2  -8.05x10-2 

0 

a4  -5.19x10-3 

-5.76x10-3  -6.42x10-3 

0 

a5  -0.111 

-9.17x10-2  -7.98x10-2 

0 

a49  -9.28x10-3 

6.22x10-3 

1.42x10-2 

0 

TABLE  2.  Selected  values  for  {a„}  with  M  -  50,  75, 
and  100;  N=  8192. 


M  50 

75 

100 

True 

value 

a0  -3.17x10*3 

-3.23x10-3  -3.46x10-3 

0 

a,  0.886 

0.907 

0.919 

1 

a2  0.997 

0.997 

0.997 

1 

a3  -0.113 

-9.27x10-2  -8.05x10-3 

0 

a4  -3.09x10-3 

-3.15x10-3  -3.38x10-3 

0 

a5  -0.111 

-9.16x10-2  -7.98x10-2 

0 

a49  -9.29x10-3 

6.22x10-3 

1.42x10-2 

0 

4.  DISCUSSION 


We  have  proposed  in  this  paper  a  novel  approach  for 
computing  polynomial  expansions  from  equally  spaced 
samples.  The  computation  involved  in  this  procedure 
falls  into  three  categories  : 

(1  {Compute  the  transformation  matrix  C;  this  com¬ 
putation  need  be  done  once  and  the  result  stored 
in  computer  memory.  The  same  matrix  C  is  used 
for  the  expansion  of  all  functions; 

(2) Given  the  function  H(x)  to  be  expanded  into  the 
polynomial  series,  compute  the  Fourier  series  expan¬ 
sion  of  H(x)  by  use  of  the  FFT.  This  provides  a 
vector  of  Fourier  coefficients; 

(3)  Finally,  multiply  chit  vector  by  the  matrix  C  to 
produce  a  vector  of  polynomial  coefficients. 

The  major  sources  of  computation  error  with  this 
procedure  are  error  in  the  FFT,  and  truncation  error 
in  matrix  multiplication  (finite  -  rather  than  infinite  - 
vectors  and  matrix) ;  these  errors  can  be  reduced  by 
choosing  larger  values  of  N  in  the  FFT  and  M  in  the  matrix 
multiplication.  It  great  accuracy  is  required,  then  large 
values  for  M  and  N  may  be  required. 

In  examining  the  computation  time  required  to  evaluate 
polynomial  coefficients  we  will  ignore  the  computation 
of  the  transformation  matrix  C.  If  this  matrix  is  recom¬ 
puted  each  time  a  different  function  is  expanded  in 
polynomials,  then  the  computation  time  for  C  must  be 
considered.  For  our  purposes,  we  assume  that  C  is 
stored  in  memory.  The  matrix  multiplication  requires 
2M  +  1  multiplications  and  2M  additions  for  each  coef¬ 
ficient;  if  L  coefficients  are  computed,  we  then  have  a 
total  of  L(2M  +  1)  multiplications.  In  many  cases,  the 
equations  will  simplify  as  in  (8).  The  FFT  routine  for 
computation  of  the  Fourier  coefficients  requires 
(N/2)  log2(N)  multiplications  (for  radix  2  FFT). 

As  a  comparison  to  the  approach  proposed  herein,  con¬ 
sider  the  computations  necessary  to  evaluate  the  in¬ 
tegral  of  (1).  First,  we  partition  the  interval  for  numer¬ 
ical  evaluation  of  the  integral.  We  then  use  a  recursion 
relation  such  as  that  in  (7)  to  generate  values  of  0n(x) 
for  the  chosen  partition  points.  Next  a  numerical 
evaluation  procedure  such  as  the  trapezoidal  rule  is 
used  to  evaluate  the  integral.  If  the  error  in  the  evalua- 


Journal  of  Computational  and  Applied  Mathematics,  volume  7,  no.  3,  1981. 


159 


tion  is  not  small  enough,  the  procedure  is  repeated 
with  a  finer  partitioning.  It  may  be  necessary  to 
iterate  several  times.  This  general  computation  pro¬ 
cedure  is  necessary  for  each  coefficient;  hence,  it  we 
want  a  total  of  L  coefficients,  we  must  evaluate  L 
integrals  in  this  manner.  The  actual  computer  time 
taken  in  evaluating  coefficients  in  this  manner  varies 
greatly  from  one  set  of  computer  code  to  another. 
One  may  argue  advantages  for  either  technique  of 
coefficient  computation;  it  is  possible  for  direct  in¬ 
tegral  evaluation  to  take  less  time  than  the  FFT  pro¬ 
cedure  provided  a  fortuitous  partitioning  is  made; 
however,  we  have  found  the  FFT-matrix  multiplica¬ 
tion  technique  to  be  particularly  simple  and  efficient. 
For  comparison  purposes  consider  the  example  of 
section  3.  We  evaluated  the  coefficients  sq,  a  j ,  *2*  a  j, 
a 4,  35,  and  a^q  using  the  trapezoidal  rule  and 
Simson’s  rule,  where  we  took  the  interval  to  be 
1-0.99999,  0.99999J.  In  table  3  we  used  601  points 
and  in  table  4  we  used  1201  points.  Notice  that  since 
seven  coefficients  are  being  evaluated,  these  corre¬ 
spond  respectively  to  4207  and  8407  samples,  and 
tables  1  and  2  correspond  respectively  to  4096  and 
8192  samples. 


TABLE  3.  Selected  values  for  {an}  using  601  samples. 


Coef¬ 

ficient 

Trapezoidal 

rule 

Simpson’s 

rule 

True 

value 

*0 

O.V77 

0.282 

0 

al 

1.23 

1.40 

1 

*2 

1.25 

1.40 

1 

*3 

0.230 

0.399 

0 

*4 

0.251 

0.399 

0 

*5 

0.231 

0.399 

0 

*49 

0.289 

0.458 

0 

TABLE  4.  Selected  values  for  {an}  using  1201  samples. 


Coef¬ 

ficient 

Trapezoidal 

rule 

Simpson's 

rule 

True 

value 

*0 

6.85  x  10~2 

0.130 

0 

*1 

1.11 

1.18 

1 

*2 

1.10 

1.18 

1 

*3 

0.112 

0.183 

0 

*4 

9.72  x  10*2 

0.183 

0 

*5 

0.112 

0.184 

0 

*49 

0.136 

0.206 

0 

5.  ACKNOWLEDGEMENT 

This  research  was  supported  by  the  Air  Force  Office 
of  Scientific  Research,  Air  Force  Systems  Command, 
USAF,  under  Grants  AFOSR -7 6-3062,  AFOSR-78- 
3605,  and  AFOSR-81-0047,  and  also  by  the  National 
Science  Foundation  under  Grant  ENG-76-82426. 
Appreciation  is  expressed  to  F.  Kuhlmann  for  assistance 
with  the  programming. 


REFERENCES 

1.  ERDELYI  A.,  MAGNUS  W„  OBERHETT1NGER  F.  and 
TR1COMI  F.G.  :  Higher  transcendental  functions,  vol.  1, 
McGraw  HUl,  New  York,  1953. 

2.  ERDELYI  A.,  MAGNUS  W„  OBERHETTINGER  F.  and 
TRICOMI  F.  G.  :  Higher  transcendental  functions,  voi.  2, 
McGraw-Hill,  New  York,  1953. 

3.  GALLAGHER  N.  C.,  WISE  G.  L.  and  ALLEN  J.  W.  :  A 
novel  approach  for  the  computation  of  Legendre  polynomial 
expansions,  IEEE  Trans.  Acoustics,  Speech,  and  Signal 
Processing,  ASSP-26  (1978),  pp.  105-106. 

4.  GRADSHTEYN  I.  S.  and  RYZHIK  I.  W.  :  Table  of  integrals, 
series,  and  products,  Academic  Press,  New  York,  1965. 

5.  YAO  1C.  and  THOMAS  J.  B. :  On  band-limited  properties 
of  I'ourier  transform  pairs  of  some  special  functions,  Proc. 
3rd  Ann.  Allcrton  Conf.  on  Circuit  and  System  Theory, 
Monticcllo.  1L,  Oct.  1965,  pp.  299-309. 


journ.il  of  Computational  and  Applied  Mai  hematics,  volume  7,  no.  3,  1981. 


160 


Some  Results  on  the  Median  Filtering 
of  Signals  and  Additive  White  Moiset 

T.  A  Nodes  and  N.  C.  Gallagher,  Jr. 


School  of  Electrical  Engineering 
Purdue  University 
West  Lafayette,  Indiana  47907 


Abstract 

The  first  order  distribution  of  the  output  of  a  median  filter  when 
filtering  a  known  signal  plus  additive  white  noise  has  been  derived  and 
is  presented  along  with  some  examples.  In  addition,  two  programs  have 
been  written  to  aid  in  the  design  of  median  filters  for  the  additive 
white  impulse  noise  case  and  some  of  these  results  are  tabulated. 


1.  Introduction 


Median  filtering,  a  method  of  signal  processing  which  is  easily  1m- 
plemented  on  a  digital  computer,  has  been  used  with  success  in  many  ap¬ 
plications.  These  applications  include  picture  processing  and  speech 
12  3  4 

processing  '''  where  it  is  employed  to  smooth  the  signal.  Previous 
work  in  developing  the  properties  of  the  median  filter  has  been  limited 

to  the  filtering  of  deterministic5  and  white  noise*  (i.i.d.)  signals. 
Unfortunately,  due  to  the  nonlinearity  of  the  median  process,  the 
analysis  of  the  important  signal  plus  additive  noise  case  is  not  a 
direct  extension  of  these  simpler  cases.  In  this  paper,  we  present  some 
results  on  the  filtering  of  signals  plus  additive  white  noise.  Specifi¬ 
cally,  we  have  derived  the  first  order  output  distribution  for  an  arbi¬ 
trary  given  signal  and  noise  distribution.  This  along  with  several  ex¬ 
amples  is  presented  in  the  second  part  of  the  paper.  In  addition,  we 
present  some  results  on  the  effects  of  additive  impulse  noise  on  median 
filtered  signals.  First,  however,  a  review  of  the  standard  median 
filter  is  in  order. 

Median  filtering  is  a  discrete  time  process  in  which  a  2N+1  points 
wide  window  is  stepped  across  an  input  signal  (see  Fig.  1).  At  each 
step,  the  points  inside  the  window  are  ranked  according  to  their  values, 
and  the  median  value  (mid-point)  of  the  ranked  set  is  taken  as  the  out¬ 
put  value  of  the  filter  for  each  window  position.  At  both  ends  of  the 
signal,  N  end  points  are  appended  to  allow  the  filter  to  reach  the  edges 
of  the  signal.  The  value  of  the  front  endpoints  is  equal  to  the  value 
of  the  first  point  of  the  signal,  and  the  value  of  the  rear  endpoints  is 
equal  to  last  point  of  the  signal.  As  an  example  of  this  process,  con¬ 
sider  Fig.  2.  Here,  a  binary  signal  of  length  eleven  (the  ■'$  represent 
the  appended  endpoints)  is  median  filtered  by  three  different  window 
widths  N  ■  1  (2N+1*3),  N  *  2  (2M+1-5),  and  N  *  3  (2M*1«7).  Notice,  for 
the  N»1  case,  the  signal  is  unperturbed,  while  for  the  N>2  and  N«3 
cases,  the  amount  of  structure  in  the  signal  is  reduced.  A  number  of 
signal  structures  which  can  be  used  to  define  the  properties  of  median 
filters  can  now  be  defined. 

tThe  authors  gratefully  acknowledge  the  support  of  the  Air  Force  Office 
of  Scientific  Research  under  grant  AFOSR  783605. 

Pmen£ed  at  thz  Eighte.znth  A nruiaJL  klltnton  Condolence  on  Communications, 

Control  and  Computing,  Scptembtn  30  -  OcAobcn  1,  1981, 


The  output  of  the  median  filter,  Y (A)  Is  given  by 

Y(A)  -  the  median  value  of  {x(a-N),...,x(A-l),x(A),x(A+l),...,x(A+N)> 
Fig.l:  The  Median  Filter 


N-  3.2,  1 

111  •  •  •  •  • 

a  a  a  t  •  ••  *  »  a  a  a 

I  1  L-J I  J— X-J L-J I — L  1 1-1 J L 


Input  signal ,  x(’) , 


• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

l_ 

1 

l 

i 

J_ 

1 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

l_ 

1 

1 

l 

1 

1 

_L 

1 

_L 

J 

# 

• 

• 

• 

• 

« 

• 

» 

• 

• 

• 

(- 

« 

JL 

-L 

J- 

_1_ 

_L 

JL. 

JL 

-1 

Output  signal ,  y. (•) , 
for  a  window  size  of  3  (N*l) 


Output  signal ,  y,(‘) , 
for  a  window  size  of  5  (N»2) 


Output  signal ,  y-(*) , 
for  a  window  size  of  7  J  (N»3) 


l-J..  L  L-L  1.  1  .1  J _ L-J 


Output  signal ,  y ,(•)  , 
2nd  pass  i 

for  a  window  size  of  7  (N=3) 


Fig. 2:  Effects  of  window  size  on  a  median  filtered  signal 


A  constant  neighborhood  is  a  region  of  at  least  N+1  consecutive 
points  all  of  which  are  identically  valued. 

An  edge  is  a  monotonically  rising  or  falling  set  of  points  sur¬ 
rounded  on  both  sides  by  constant  neighborhoods. 

An  impulse  is  a  set  of  N  or  less  points  whose  values  are  different 
from  the  surrounding  regions  and  whose  surrounding  regions  are 
Identically  valued  constant  neighborhoods. 

A  root  is  a  signal  which  is  not  modified  by  filtering. 

Gallagher  and  Wise  15,63  have  shown  that,  while  impulses  are  elim¬ 
inated  by  median  filtering,  constant  neighborhoods  and  edges  are  unper- 


turbed,  and  in  fact,  only  signals  composed  solely  of  constant  neighbor¬ 
hoods  and  edges  are  roots  to  the  median  filter.  Again  referring  to  Fig. 
2,  note  that  the  signal  is  a  root  of  the  N*1  median  filter  but  not  for 
filters  with  N  greater  than  one.  However,  after  one  pass  of  the  N*2 
filter  or  two  passes  of  the  N®3  filter  the  resulting  outputs  are  roots 
of  their  respective  filters.  In  fact,  Gallagher  and  Wise  have  also  pro¬ 
ven  that  any  signal  of  length  L  is  reduced  to  its  root  after  at  most 

j*(L-2)  successive  passes  by  any  median  filter.  Furthermore,  any  root 

of  a  median  filter  with  a  particular  window  size  is  also  a  root  of  any 
median  filter  with  a  smaller  window  size. 

For  1.1. d.  (white)  random  signals,  Kulman  and  Wise*  have  derived  the 
second  order  statistics  of  the  median  filter.  They  further  show  that 
for  all  the  distributions  which  they  have  investigated,  which  include 
most  of  the  common  ones,  the  median  filter  has  a  low  pass  effect  on  the 
signal  spectrum,  and  thus  increases  the  correlation.  In  fact,  this  is 
also  often  true  with  more  general  signals;  however,  due  to  the  nonlinear 
nature  of  the  filter  there  are  cases  where  the  second  moment  bandwidth 
of  a  signal  is  actually  increased  upon  median  filtering  and  thus  the 
correlation  decreased.  Thus,  one  must  use  some  care  in  applying  the  low 
pass  assumption  to  median  filters. 


II.  Output  Distribution 

Section  I  reviewed  much  of  the  previous  work  on  properties  of  median 
filtered  deterministic  and  i.i.d.  signals.  As  stated  earlier,  the  more 
general  case  of  filtering  signals  plus  additive  noise  is  much  more  dif¬ 
ficult  to  analyze.  In  this  section,  the  first  order  distribution  of  the 
output  of  a  median  filter  with  a  known  signal  and  additive  white  noise 
input  is  given.  This  is  used  in  program  Dis  to  compute  some  statistics 
of  the  output  of  the  median  filter  several  examples  of  which  are  given. 


If  the  output  of  median  filter  at  position  m  has  a  distribution  of 
FY<q,m)  and  the  input  a  distribution  of  Fx(q,i)  ■  Fno^se  id  ”  s.j>  where 

s.j  *  signal  at  position  i,  then  the  output  distribution  is 


AT+m-fc  Af+n 
Fv(q,m)=  JJ  ...  2 

*■1  1 


n  (?.»)] 

Kcc 


FI 

k«o 


m*N 

♦  n  Fz(q,i) 

imm-N 


where 


2»M  ♦  1  »  window  width 


t*^,»2/'M,«l()  U  <a1,a2,,**,ak>  *  <1,2, 


•,(2«N+1» 


f(a)  ♦  •••  ♦  f(b)  If  a  <  b 
1  if  a  >  b 


This  result  comes  about  from  combining  all  possible  combinations  of  the 
points  inside  the  window  such  that  at  least  N+1  of  them  have  values  £  q. 
It  is  straightforward  to  extend  this  result  to  obtain  the  first  order 
output  distribution  for  any  arbitrary  input  (any  arbitrary  random  pro¬ 
cess)  if  the  (2N+1)th  order  distribution  is  known  at  every  position, 
however,  this  result  is  somewhat  cumbersome  and  is  not  presented  here. 


b 

L 

i-a 


f  (1) 


The  above  equation  was  incorporated  into  program  Dis  to  compute  the 
value  of  the  first  order  aedian  filter  output  distribution,  Fy(q,a),  for 

a  signal  plus  white  noise  input.  This  is  then  used  to  rtuaerically 

evaluate  soae  of  the  statistical  properties  of  the  output  at  each  posi¬ 
tion  a.  Specifically,  bis  coaputes  the  value  of  the  aean,  E  <Y>,  the 

standard  deviation,  oy,  the  aean  square  error,  H.S.E.  <*  E«y.  -  s^)Z», 

and  the  absolute  error,  A.E.  («  E<|y^  -  s^J)  at  every  position.  These 

teras  aay  then  be  plotted  as  in  Fig.  4  through  Fig.  7  or  averaged  over 

the  signal  and  tabulated  as  in  Tables  1  and  2.  These  exaaples  illus¬ 
trate  soae  of  the  effects  of  median  filtering  signals  plus  noise.  Two 

different  distributions  (iapulsive  and  gaussian)  both  with  the  saae  two 
noise  power  levels  are  used  in  these  exaaples.  The  iapulse  noise  used 
is  double  sided  symmetric  with  heights  of  t  3  and  probabilities 

P*  *  P-  ■  0.001  and  0.0S  for  noise  powers  of  e*  *  0.018  and  0.90  respec¬ 
tively.  Likewise,  the  gaussian  noise  powers  are  also  o*  ■  0.018  and 

0.90.  For  coaparison,  results  are  also  given  for  windowed  averaging 
filters. 

First,  consider  a  constant  signal.  The  results  froa  a  constant  sig¬ 
nal  Indicate  the  effects  that  the  noise  distribution  by  itself  has  on 
the  filter  output.  The  results  for  several  such  cases  using  averaging 

and  aedian  filters  are  given  in  Table  1.  It  can  be  seen  that  the  Aver¬ 
age  Filter  does  soaewhat  better  than  the  aedian  filter  when  filtering 
gaussian  noise.  This  is  expected  since  for  a  set  window  width  the  Aver¬ 
age  Filter  is  the  optiaua  H.S.E.  estiaator  in  his  case.  However,  when 
iapulse  noise  Is  present  the  aedian  filter  reduces  the  output  noise 
power  by  orders  of  aagnitude  acre  than  the  Average  filter.  This  is  due 
to  the  ability  of  the  aedian  fitter  to  totally  eliainate  low  probability 
high  power  impulses  which  is  not  possible  with  linear  systeas.  In  fact, 
it  can  be  shown  that  for  a  fixed  window  width  the  aedian  filter  is  the 
optimum  HAP  estiaator  in  this  case.  In  general,  for  constant  signals 
median  filters  have  been  found  to  out-perfora  averaging  type  filters 

fi 

when  the  tails  of  the  additive  noise  density  are  extensive  compared  to 


the  gaussian  case.  Also  certain  types  of  general  signals  are  particu- 

Table  l : 

Mean  square  error  of  median 

and  average  filter  outputs 

wi  th 

constant  signal  plus  noise 

inputs.  Window  width-2N+l 

Input 

Add  i  t  i  ve 

Average  Filter 

Median  Filter 

Noise 

n«l  n*3  n-5 

• 

c 

7 

c 

n-5 

Impulse 

a.  2«0.0I8 
i  n 

6.000E-3 

2.573E-3 

I.637E-3 

5.396E-5 

6.285E-10 

8.26I2E- 

o,„2.0.9° 

3.000E-I 

1.286E-1 

8.I82E-2 

I.305E-I 

3.484E-3 

I.044E-L 

Gaussian 

o.  ?-0,0>8 
i  n 

6.000E-3 

2.573E-3 

I.637E-3 

8.909E-3 

4.6IOE-3 

3.2I5E-3 

o.  2-0,900 
in 

3.000E-I 

I.286E-I 

8.I82E-2 

4.<M6E-I 

I.902E-I 

I.243E-I 

tarty  suitable  for  Median  filtering  irregardless  of  the  noise  distribu¬ 
tion. 


As  pointed  out  earlier/  many  systems  generate  signals  which  are  not 
amenable  to  the  general  spectrum  separation  techniques  that  ease  the 
design  of  linear  filters.  Often  this  is  due  to  the  presence  of  sharp 
edges  in  an  otherwise  low  frequency  signal.  Such  structures  tend  to  be 
roots  to  median  filters  making  the  median  filter  a  good  alternative  for 
smoothing  such  signals.  One  such  signal  is  used  here  to  illustrate  the 
effects  of  median  filtering  these  signals  when  additive  white  noise  is 
present.  This  signal  ranges  from  -2  to  2  and  consists  of  edges  and  con¬ 
stant  neighborhoods.  Figures  3  through  6  plot  the  filter  output  expect¬ 
ed  value  and  standard  deviation  (ECY.>/  E<Y.>  +  ov  and  E<Y.>  -  ov  )  at 

1  1  Ti  1  Ti 

each  position  as  solid  lines  and  the  original  uncorrupted  signal  as  a 
dashed  line.  For  comparison/  the  results  for  a  windowed  average  filter 
are  shown  in  Fig.  3.  As  with  the  median  filters/  the  window  width  *  2  • 
N  ♦  1. 

As  illustrated  above/  the  median  filter  does  an  excellent  job  of  el¬ 
iminating  impulses  (see  also  Table  3).  However/  with  non-constant  sig¬ 
nal  structures/  other  types  of  errors  become  prevalent  when  impulse 
noise  is  present.  Foremost  among  these  is  edge  jitter.  This  effect  is 
present  even  at  low  noise  levels  and  is  not  reduced  by  using  larger  win¬ 
dows  as  Illustrated  in  Fig.  4a  and  Fig.  5a.  This  effect  will  be  further 
discussed  In  Section  III.  Fig.  S  also  shows  the  effects  of  filtering 
with  larger  windows.  The  final  peak  of  the  signal  is  only  five  points 
wide  Instead  of  the  six  <»  N  ♦  1>  necessary  to  pass  through  an  N*5  medi¬ 
an  filter  unperturbed.  Fig.  5  also  illustrates  another  error  form  which 
occurs  when  the  width  of  a  plateau  or  valley  approaches  N+1  points.  One 
or  two  impulses  of  the  correct  sign  located  within  such  a  plateau  will 
cause  the  whole  plateau  to  drop  to  the  closest  point  below  it/  which  can 
be  a  substantial  change. 


(a) 


•  i* 


H  »  U  N 


(b) 


i»  it  m  n 


Fig.  3:  For  output,  y,  of  an  averaging  filter,  with  input  signal  plus 

noise  the  E{y}( - ),  the  £{y)  ♦  oy(— — ),  the  E(y)  -  oy( - ), 

and  the  Input  signal  (— — )  are  plotted  for  a)  N-l  and  PN-0.018 
and  b)  N-3  and  PN-0.90  where  PN» input  noise  power  and  the  window 
width-2  N+1 


Fig.  4:  For  output,  y,  of  a  median  filter  with  an  Input  signal  plus 
impulsive  noise,  the  E{y}(— —),  the  E{y>  +  oy(-— the 
E{y>  -  oy ( " — ) ,  and  the  input  signal  ( — --)  are  plotted  for 
a)  N-3  and  PN=0.0l8(P+=P-«0.00l  and  Height(lmp.)*+3)  and  b)  N=3 
and  PN=*0.90(P+=P-=0.05  and  Height  (imp. )*+3)  where  PN=input  noise 
power  and  the  window  width“2*N+l»7 


Fig.  5?  for  output,  y,  of  a  median  filter  with  an  Input  signal  plus 

Impulsive  noise,  the  E{y}(— ),  the  E{y}  ♦  oy( - ),  the 

E(y)  -  oy( - ) ,and  the  input  signal  ( — -)  are  plotted  for 

a)  N*5  and  PN=0.0l8(P+*P-*0.Q0I  and  Height(lmp.)*+3)  and  b)  N«5 
and  PN“0. 90(P+=P-=0.05  and  Height( lmp.)*+3)  where  PN*input  noise 
power  and  the  window  width“2*N+l“n 


Conversely/  when  gaussian  noise  Is  present/  quite  different  results 
are  obtained.  As  can  be  seen  from  Fig.  6/  In  this  case  the  Std.  Dev.  of 
the  output  Is  much  more  smooth  and  constant/  than  with  Impulse  noise, 
and  the  plots  more  closely  resemble  the  results  of  the  Average  Filter 
much  more  closely  than  before.  This  Is  further  Illustrated  In  Fig.  7 
which  plots  the  density  of  the  output  of  the  N*3  median  filter  at  posi¬ 
tion  34  (as  reviewed  In  Fig.  6).  Notice  that  while  It  Is  shifted  and 
the  Std.  Dev.  reduced.  It  Is  still  fairly  smooth,  symmetrical,  and  bell 
shaped  (although  the  tails  do  exhibit  some  assymmetry  which  Is  unobserv- 


Flg.  6:  For  output,  y,  of  a  median  filter  with  an  input  signal  plus 
gaussian  noise,  the  £{y> the  E{y)  ♦  oy(— — ),  the 

E{y }  “  oy( - ),  and  the  input  signal  (-— )  are  plotted  for 

a)  N-3  and  PN-0.018  and  b)  N»5  and  PN=0.90  where  PN«input  noise 
power  and  the  window  width»2*N+l«l 1 


Fig.  7:  The  output  density  (upper  curve)  of  an  N«3  median  filter  with  an 
input  of  signal  plus  gaussian  noise  (density:  lower  curve)  with 
PN«0.0l8  at  position  3*t  (see  Fig.  6) 


able  in  Fig.  7).  This  is  due  Co  the  fact  that  gaussian  noise  perturbs 
almost  all  the  input  points  by  a  saall  amount  rather  than  just  a  few  by 
a  large  amount  as  is  the  case  with  iapulse  noise.  However,  the  aedian 
filter  tracks  the  signal  aore  closely  than  the  Average  filter  does.  A 
suaaary  of  the  average  H.S.E.s  for  the  above  filter  is  given  in  Table  2. 


Table  2:  Average  mean  square  error,  M.S.E.,  of  filter  output  filtering  a 
100  point  signal  and  additive  noise  (see  Fig.  3  through  Fig.  7) 


Input 

Average  Flit. 

Median 

Flit. 

Add  1 1  i  ve 

Noise 

n-1 

n-3 

n-3 

n-5 

Impulse 

o2-0.0I8 

1.125E"! 

2.5I4E-1 

3.202E-3 

5.590E-2 

o2-0.90 

4.043E-1 

3.774E-I 

1.502E-I 

2.640E-I 

Gaussian 

o2-O.OI8 

I.I25E-I 

2.514E-I 

1.333E-2 

o2-0.900 

4.043E-I 

3.774E-1 

3.553E-I 

4.280E-1 

III.  Impulse  Noise 

The  special  case  of  signal  plus  white  iapulsive  noise  is  of  particu¬ 
lar  interest  as  the  aedian  filter  appears  to  perfora  especially  well  in 
reducing  this  type  of  noise.  As  pointed  out  above  and  in  Table  3,  this 
is  due  to  the  fact  that  the  probability  of  an  iapulse  being  transferred 
to  the  output  of  a  aedian  filter  is  saall.  And  while  this  is  the 
predominate  error  form  for  constant  signals,  when  aore  signal  structure 
is  added  other  types  of  errors  take  over.  The  problea  of  edge  jitter 
appears  to  be  particularly  significant.  This  was  shown  in  Section  two 
with  output  standard  deviation  plots  and  can  be  qualitatively  explained 
as  follows.  As  pointed  out  above  N+1  impulses  of  the  same  sign  must  be 
inside  the  filter  window  in  order  for  the  output  to  assume  the  value  of 
an  iapulse.  However,  if  a  signal  edge  is  being  filtered,  then  an  edge 
point,  x(t)  can  be  shifted  by  j  <  N*1  positions,  Y(t+j)  «  x(t>,  by  the 
simple  presence  of  j  impulses  of  the  correct  signs  within  N*1  positions 
of  t.  Narrow  (  «/*  N+1  positions  wide)  plateaus  and  valleys  are  also  sus¬ 
ceptible  to  impulses;  however,  these  structures  are  much  less  common 
than  edges  in  most  signals. 

The  distribution  of  edge  jitter,  j(y(t*j)  *  x<t)>  has  been  derived. 
The  equations,  however,  are  rather  untractable  and  do  not  lead  to  any 
particular  insight  into  the  process;  thus,  they  will  not  be  presented 
here.  The  distribution  was  incorporated  into  program  Edg  which  was  used 
to  compile  Tables  4  and  5.  Table  4  lists  the  standard  deviation  of  the 
edge  jitter,  j,  for  a  number  of  different  window  sizes  (window  width  - 
2N  ♦  1)  and  double  sided  impulse  probabilities.  Table  3  should  be  used 
in  conjunction  with  Table  4  since  the  possibility  of  an  impulse  at  the 
output  is  not  incorporated  into  the  standard  deviation  computation. 
Note  if  the  edge  has  only  two  states,  then  ,  as  seen  in  Table  4,  the 
mean  square  error  contributed  by  each  edge  is  approximately  doubled  by 
increasing  N  from  1  to  5  if  the  probability  of  impulse  is 
P  *  P_  *  0.05,  and  this  ratio  decreases  with  decreasing  P  *  p_.  This 


{ 

i 

* 


\ 


t 

L 


C 

tO 

X) 

I 

to 

H- 

o 


3 

Cl 


?????????? 

i  -o  a  —  ta  r-  &•  a>  ift  *  cv 
_  c  o  c  j  -o  -o  «  w  4 
7  nttO-'Oi’O-ift&'-o 
fc  *  *  »  PI  10  *<  *0  N  ♦  CD 

*  ci  ci  ci  *  w  ci  o  **  •*  « 

"8  mN^»-oo5;r« 

,  *5  M  i<  «  »>  "  H  O  O  O  O 

*  i  i  M  i  iTTT  i 

m  »  t  i  «  »  t  t  »  •  • 

*5  mriN4«N0>njQn 
—  i  nHo«N4tnfio 

*  co-nrjBninflD««n 
M  r  n  n  cj  o- 1£  *«  •;  o  n 
a  ntn«<a3nin^04in 

o 

«  nconoiom^tonci 

*  77777????? 

«  «mnin<r'0^'0inift'O 
M  a  oiinniaoN^-M)* 

*  cchccioochr^fM«nio^ 

*  -«^or^«*NCD-»nt> 


o 

c 


> 


T3 

<0 


c 

oi 


T3 

ft) 

l_ 

ft) 


????????!? 

•  ••ft****** 

«  Mrjon^nnmind 
cnrt«amon<ou# 

0<nSN4IMnNNr« 

"Nx^iCNvonw 


NN - 

OOOO 

I  • 


3 

o  __ 

(U  "o  »AH(MlMn«nN 

?  -»«-<»-<OOOOOP© 

i  ■  i  i  i  •  i  i  I  i  T  i 

-X  *»»!>«••«*• 

*/  i  nonoiMf  "Oowms 
«  &  •  ocrfj  —  o-ift-^OJOvn 
.  CO'-tO-OO-O'-OWO'OB 
<D  M  0‘0‘-C**l^0,M«»0O 

w  >  (MM>«n4  0inina 


V 

E 


o-  a>  o-  w  o  co  . 


3 

Cl 

E 


I 


«r  —  cor''Oir)'rnriiN 

_  „  --oooooooo 

£  S  I  I  I  I  I  I  I  I  I  t 

*0  z  • 

w<rO'*nrjonmon 

t;  “  i  t>x7O'vc0O'*rvi^0- 

°  •  eo-o-«ro«  —  r'Pwri> 
.  a  0‘0-o-0‘'<J«ro<f>a)o 
O'  o-  o*  tT>  e-  <>•  *>  ** «»■  N 


«i 

CT) 

x> 

v 


-O 

o 


o  - 


fA 

41 


oo4<K«nNn« 
-*OOOOpOOOO 

•  i  i  i  T  i  i  ‘ 


i  i 


ii 

0  -  oooooriMMn 
+  lo-ooomoO'OrtN 
„  c  o-  a-  o  n  fN  o  o  m  o  *• 
O  (MMMJ-OOfJNPIIft 

o-  O'  o>  n  *  o-  r>  o  o  o 


o 


*i4nnpini«nij  •* 
•  QOOOOOOOOO 
ft  I  I  I  I  I  I  I  I  I  I  I 
iftttmiitliiat 
“OOOOOOQOOO 

oooooooooo 
ftt  -«  U  f  j  »>  *<  <V  *i  *«  oi 


> 

ft) 

“O 


T3 

W 

on 


ft) 


3 

& 


o 

♦* 


■  a  » 

u 

►  ♦ 


—  in oooooooooo 
-O  Q.  >  oooooooooo 

flj  “ 


H  H  f#  ft  h  ci  W  H  ci 


1 1 
i 


OOOOOHHHNt 


Di«4*4f4HHHQ0  « 

?????????  2 


lA4AIIMIIininSx  *n 
He  40ffl0lHK(DN4 
c«*rsrtr>.<ajv*xf» 
0'<orm4Mems<h 
o«on«nonN}> 

-*ri-*-*rto«a>-*-« 


u  tyooooo 

on 


HHH  WO 


??????!?  ? 


«  N  . 

OOOl 

•fCjmO'C»mo}mv<c 

H  ONOfl>H<|Ool 

coro-oocjoforoo# 

040-ifi«0'<»)0 

o**o«,f»tt'0'«**r» 


A  > 

h-fOOOOOO— <Nfl 

2  si 

«*  C 


^  jf  « cw  «  n  n  ci  n  *>  ^  ci  ~ 


ncuNHriHHwoo 
*  OOOOOOOOOO 
c  i  i  i  i  T  T  i  i  ♦  T 

_  h  ea>ivNoor4  0nrI 
■£  c«r  umditihoimi** 
erj^i-o^trarxpo 
~  O 


1  8 


^xnoooooo«4Hfin 

w  ♦«  » 

c*C 


Onai'<N(ll«4rtrt 

OOOOOOOOOO 

w  i  i  i  *  T  I  i  i  i  * 

ia  vnwdiiftaa** 

«©-occi''.o~OTOrs>o 
^  ii  offic  4<ronihiftrt 
>  e-30'NOenoaj.xO 

e  ffio-fiDinfjpiA 

«  rs  *  r»  o  <»  m  r»  «  o 


2  "5 

u  — 
4)  ~ 
o  L. 

_  • 


^-cmOOOOOO- 


T«  W 


NfiAHH«n«oiH  * 


*  » 
u  > 

«  • 


OOOOOOOOOO 
I  I  I  I  I  1  I  I  I  I 
(■*•«•*««  01* 

■cOMfinoiomoo 

ervoe4jooo-oo 

»IM)M10>»000 

rilE^HOHUtOO 

4  «'  4  d  h  *h  oi  e  ni  4 

ofnnnNNWHrt 

OOOOOOOOOO 

I  I  I  I  I  I  I  I  I  I 
■  tttiiiiutta 


*—  ? 

•->  r-ooooooox^ 

^  -  i 
o  » 

s  s 

K 


3 


LA 


43 


0  ojnnonriWHH 

-,?????????? 

■  aHliiitt****** 

?  »  OOOOOOOOOO 
CLIOOOOOOOOOO 

i  w  «  n  (i  W  "  oi  •)  H  oi 


I 


I 


increase,  then,  must  be  reconciled  with  a  corresponding  decrease  in  the 
probability  of  an  impulse  at  the  output  by  a  factor  of  1,300  for  the 
same  parameters  as  above.  Further  information  can  be  obtained  from 
Table  S  which  gives  the  amount  of  jitter  with  9QX  certainty.  The  list¬ 
ing  "impulse"  in  this  table  indicates  that  the  probability  of  the  output 
assuming  the  value  of  an  impulse  Is  greater  than  10X.  The  use  of  these 
tables  in  conjunction  with  the  deterministic  properties  developed  by 

Gellagher  and  Uise^  should  greatly  facilitate  the  design  of  median 
filters  used  in  filtering  signals  with  additive  impulse  noise  as  they 
help  to  quantify  the  various  trade  offs  available  in  such  designs. 


IV.  Conclusion 

The  first  order  distribution  of  the  output  of  a  median  filter  for  a 
signal  plus  white  noise  input  was  presented.  Using  this,  the  statistics 
of  several  examples  with  impulsive  and  gaussian  noise  were  computed  and 
given.  These  Illustrate  some  of  the  properties  of  median  filtering. 
Edge  jitter  and  narrow  plateau  jitter  are  seen  to  be  the  dominate  error 
modes  for  Impulse  noise.  For  gaussian  additive  noise,  the  output  more 
closely  resembles  that  of  an  average  filter  but  with  a  larger  standard 
deviation  and  closer  tracking  of  edges.  For  the  additive  Impulse  noise 
case,  some  statistical  properties  of  the  edge  jitter  is  tabulated. 
These  results  should  aid  in  the  design  of  median  filters  since  they  il¬ 
lustrate  many  of  the  properties  the  designer  can  expect  from  these 
filters  in  the  Important  signal  plus  white  noise  case.  However,  much 
more  work  needs  to  be  done  In  this  area  to  develop  easier  to  use  and 
more  general  descriptions  of  the  properties  of  the  filter  while  retain¬ 
ing  some  quantitative  ability. 


References 


1.  J.  W.  Tukey,  "Nonlinear  (Nonsuperposable)  Methods  for  Smoothing  Da¬ 
ta,"  in  Cong.  Rec.,  1974  EASCON,  p.  673. 

2.  T.  S.  Huang,  6.  T.  Yang,  and  G.  Y.  Tang,  "A  Fast  Two-Dimensional 
Median  Filtering  Algorithm,"  IEEE  Trans.  Accoust.,  Speech,  and  Sig¬ 
nal  Processing,  vol.  ASSP-27,  pp.  13-18,  Feb.  1979. 

3.  L.  R.  Rabiner,  M.  R.  Sarnbur,  and  C.  E.  Schmidt,  "Applications  of  a 
Nonlinear  Smoothing  Algorithm  to  Speech  Processing,"  IEEE  Trans.  Ac¬ 
coust.,  Speech,  and  Signal  Processing,  Vol.  ASSP-23,  pp.  552-5 57, 
Dec.  1975. 

4.  N.  S.  Jayant,  " Average  and  Median  Based  Smoothing  Techniques  for  Im¬ 
proving  Digital  Speech  Guality  in  the  Presence  of  Transmission  Er¬ 
rors,"  IEEE  Trans,  on  Commun.,  Vol.  COM-24,  pp.  1043-1045,  Sept. 
1976. 

5.  N.  C.  Gallagher,  Jr.  and  G.L.  Wise,  "Passband  and  Stepband  Properties 

of  Median  Filters,"  IEEE  Trans.  Accoust.  Spch.  Sig.  Proc.,  to  be 
published. 

6.  S-G.  Tyan,  It  has  come  to  our  attention  that  S-G.  Tyan  has  also 

proved  a  number  of  these  properties.  We  have  not  seen  the  proofs 
and  can  only  speculate  as  to  their  form. 

7.  H.  A.  David,  Order  Statistics.  (1970),  Wiley,  New  York. 

8.  F.  Kuhlmann  and  G.  L.  Wise,  '*0n  Spectral  Characteristic  of  Median 

Filtered  Independent  Data,"  IEEE  Trans,  on  Commun.,  Vol.  COM-29,  No. 
9,  pp.  1374  Sept.  1981. 

9.  Aulx  F.  Velleman,  "Definition  and  Comparison  of  Robust  Nonlinear  Data 

Smoothing  Algorithms,"  Journal  of  the  American  Statistical  Assoc., 
pp.  609,  Sept.  1980. 


652 


IEEE  TRANSACTIONS  ON  INFORMATION  THEORY,  VOL.  IT-27,  NO.  5,  SEPTEMBER  1981 


On  is  Class  of  Random  Processes  Exhibiting  Optimal 
Nonlinear  One-Step  Predictors 

T  E  McCANNON  and  NEAL  C.  GALLAGHER,  member,  ieee 

A  fatruci — T wo  classes  of  random  processes  that  exhibit  one-step  predic¬ 
tors  with  optimal  nonlinear  minimum  mean-squared  error  (MMSE)  ate 
discussed,  and  conditions  tor  membership  to  one  ol  these  classes  are  Riven. 
Examples  of  each  class  are  presented,  and  the  optimal  one-step  predictors 
are  given. 

I.  Introduction 

The  problem  of  designing  minimum  mean-squared  error 
(MMSE)  prediction  filters  is  often  complicated  by  the  absence  of 
prior  information  on  the  mathematical  structure  of  the  optimum 
predictor  Historically  it  has  often  been  assumed  that  the  opti¬ 
mum  implcmentablc  predictor  is  linear,  and  such  well-known 
techniques  as  Wiener- Hopf  spectral  factorization  or  the  ortho¬ 
gonality  principle  are  applied  to  determine  the  optimum  predic¬ 
tion  filters.  With  the  advent  of  modern  digital  technology,  nonlin¬ 
ear  functions  are  often  easily  implemented,  and  hence  a  renewed 
interest  in  optimal  nonlinear-prediction  theory  has  arisen. 

We  have  previously  presented  [I]  two  methods  of  designing 
nonlinear  MMSE  predictions  filters  where  wc  have  assumed  a 
polynomial  nonlinearity  followed  by  a  linear  filter.  For  both  of 
these  design  methods,  all  that  is  required  is  knowledge  of  a  finite 
number  of  moments  and  cross  moments  of  the  given  random 
process.  Wise  and  Gallagher  [2]  have  shown  that  knowledge  of 
certain  moments  is  sufficient  to  specify  the  conditional  expecta¬ 
tion.  In  this  case,  the  optimum  nonlinear-prediction  filter  is  given 
by  a  polynomial  in  the  sample  observations. 

In  this  correspondence  we  point  out  two  classes  of  k  th  order 
stationary  random  processes  {*„}  possessing  as  their  optimum 


Muiuucnpi  received  June  26,  1980  This  work  *•»  supported  by  the  Air 
Force  Office  of  Scientific  Research  under  Grant  AFOSR-78-3605 
The  authors  are  with  the  School  of  Electrical  Engineering.  Purdue  Univer¬ 
sity.  Weil  Lafayette,  IN  47904. 


MMSE  estimate 

E{Xn  +  l\X„--,Xm-k+l)  =  /(Xn,  -,X^t  +  t), 

where  we  assume  /( ■ )  to  be  a  Bore!  measurable  function  that  can 
be  of  a  nonpolynomial  form.  The  first  class  we  consider  corre¬ 
sponds  to  the  random  process  being  represented  by  a  nonlinear 
stochastic-difference  equation 

*„+i =/(*..•  +  (1) 

The  second  class  corresponds  to  the  output  obtained  from  pass¬ 
ing  a  known  process  through  an  invertible  zero- memory  nonlin¬ 
earity  (ZNL).  Such  a  process  is  of  the  form  X„  =  g(  Z„),  where  we 
know  the  form  of  the  predictor  for  the  ( Z„ )  process.  This  class  of 
problems  is  of  particular  interest,  because  the  best  predictor  for 
X„  does  not,  in  general,  involve  finding  the  best  prediction  for  Z„ 
and  using  it  as  the  input  valve  for  g(  ).  The  form  of  the  optimal 
X„  predictor  can  be  quite  complicated. 

II.  Class  1 

We  define  a  k  th-order  stationary  random  process  {A'„}  as 
being  a  Gass  1  random  process  if  and  only  if  { A-,)  can  be 
represented  by  a  stochastic  difference  equation  in  the  form  of  ( I ), 
where  (U„)  are  independent  identically  distributed  (i.i.d.)  zero- 
mean  random  variables  with  a  marginal  density  given  by  Pj  ). 
Clearly  the  conditional  expectation  of  1(  given  the  infinite 
past  of  the  process,  is 

E{  .1  g  .•••}=/(  X„,  X„  ,X,  _k  + , ). 

Writing  the  Chapman- Kolmogorov  equation  for  the  Ath  variate 
densities,  we  obtain 

~  J  "  '/?«+  l(*n+l»‘  '  +  2l*»»‘  1  '•*«-*  t- 1  ) 

where  P}"'{xa,- ■  •,*„_*+ ,)  is  the  joint  density  of 
(X„,-  ■  ,)  for  the  nth  sampling  instant,  and 

I.'  •,xB-*+2|xn,--  -,x„_t+1)  is  the  conditional  density 
of  (X.m,- 6>vcn  for  the  (n  +  1) 

sampling  instant.  From  (1),  we  obtain 

=  /M*«+l  - /(*..”•.■ *,—*  +  2.  *,-*♦  I  )] 

'  Pg  Xm ,  •  •  *  2  *  A  -  l - 

0) 

Since  we  have  assumed  that  {X„}  is  k th-order  stationary,  the  k  th 
variate  densities  are  independent  of  n,  and  (3)  can  be  rewritten  as 

*i.-- •,**-i)=/i*.lf  “/(*!.•  •*,**)]i>,(*i.*  ••.**)**  • 

(*) 

We  now  state  the  following  property. 

Property  /.  If  {X„)  is  a  k th-order  stationary  random  process 
and  is  representable  in  the  form 

*„♦,=/(  X„.  •.X,_*+l)+f/Btl, 

where  {(/,}  are  i.i.d.  zero-mean  random  variables,  then  the  a  th 


001 8-9448/8 1/0900-0652S00.75  ©1981  IEEE 


IEEE  TRANSACTIONS  ON  INFORMATION  THEORY.  VOL.  11-27,  NO.  5,  SEPTEMBER  1981 


variate  density  satisfies  the  integral  equation  First,  it  is  necessary  to  define  what  we  will  call  optimal  MMSE 

estimators  and  suboptimal  MMSE  estimators.  We  consider  an 
F,(  i .  r , . ■ •  •  •  ,s *  | )  =  jpu  ( t  -  /( s | . •  •  •  ,sk ) ]  Px( s , . •  ■ . j*  )  dsk .  MMSE  estimator  to  be  optimal  if 

Our  convention  is  that  lower  case  variables  represent  the  realiza-  £{(v— £{y|x, ,•••,*>})  ] 
tions  of  upper  case  random  variable.  Furthermore,  the  optimum 

MMSE  prediction  filter  is  given  by  =  ( F  -  £{  y|xi  •  •  •  jc  }  )J  j 

£(  ■  -.x.-i  +  i)  =  fix.,-  ■  •  ,x.  .  .  .  . 

that  is,  one  cannot  do  better  even  if  more  information  is  avail- 

An  equivalent  representation  can  be  derived  in  a  straightforward  able.  Similarly  we  consider  an  MMSE  estimator  to  be  suboptimal 
manner.  We  begin  by  noting  that  the  (k  +  l)st  variate  density  is  if 
given  by 


M'.*!.  -  •.**)  =  Ml -/(*!,•  ■•.r*)]/‘x(s1, (5)  £{(y-£{yl*,«-  •■*/})2) 


We  then  perform  the  expectation  of  (/exp  /S*  [Slpi  1,  where  i  is 
the  complex  constant,  as  follows: 


1 1  exp  /  2 


2  s,Pt 


k 

=  /•••/  rexpi2v»,  tlsk 

l  7=1 

k 

L  7=1  . 


that  is,  one  can  do  better  with  more  information  available. 

Consider  the  case  where  (Z„)  is  in  Class  I,  as  discussed  in  the 
previous  section.  The  known  optimal  one-step  predictor  is 

ll*A>‘  •2»-*  +  l}  ~  /(  !n  >"  ‘  ‘  >*«  -  A  +  I  )• 

We  know  from  (1)  that  the  conditional  density  of  Z„ ,  given 
(Z„,-  +  is  given  by 

*  ’  •*«  —  *+  1  )  —  £i.l  I  ~  '  '  >zn  All)]* 


PJs,.  ,st)Us,  -dSi. 
Via  (I)  we  see  that 


(6)  such  that 


I  am)  =  .)]*• 


Hence  we  can  write 


_/”s(Z»+  1  )Pw[*ir4  I  /(*«•’■'  •  *»-*  +  I  )]  *  I*  (II) 

Employing  a  change  of  variable,  we  can  rewrite  (1 1)  as 


/  jtexp/  2  \p,  J  /  • '  •  //(•»!. ■••.•*a)  2  S,P,  £’U(Z»- i)l*.."‘.*»-*  +  i) 


•  <^A 


'•Za-A+I  )]p,(u)du.  (12) 


=  £i/(S,,  -.S*)expi 


,?,4 


If  we  assume  that  g(  )  can  be  written  in  the  Taylor  scries 

,  .  v  _  S  *(,)(«)  , 

g(x  +  o)=  2  — -j — *• 


Thus  wc  have  (7)  equivalent  to  (5).  Note  that  (7)  involves  the 

characteristic  function  and  as  such  requires  knowledge  of  the  k  th  then  we  can  rewrite  (12)  into  the  form 
variate  distribution.  However,  there  may  be  circumstances  where 

(7)  may  be  easier  to  apply  than  (5).  E[g(Z„,  i)lz«*- '  •<*»-*+i} 

The  expression  in  (7)  is  a  generalization  of  a  result  presented 
by  Balakrishnan  (3)  for  polynomial  nonlinearilics  £7(  ■).  i.c„  =  2  g{,)[f(z  ,•  ■  -  ,zb-a 


=  2  S(,*1/(*..---.^-a  +  i)]7T“,£»(«)‘/“  (14) 


I-  rexpt  2  Vil  -  £•  Q(Sx,--,Sk)expi  2  Syp,j,  (8)  Defimn8 

7-  '  J  l  J  _  1  (  ,  ,  v  , 

a,  =  -jju‘py(u)du, 

where  Q(  )  is  the  optimum  MMSE  estimator. 

and  using  the  fact  that  g()  is  an  invertible  function,  we  can 
Ill.  Class  11  rewrite  (14)  into  the  desired  predictor  for  {Jf,}: 

Define  a  stationary  random  process  (Z„}  such  that  ...  x  k  ,) 

X„^g{ZJ,  /i  =  {  -.-2.  -  1.0,1. 2.}.  (9)  "  *  "  *  ( 

where  g(  )  is  an  invertible  function.  This  particular  relation  is  of  (T»)>  •*  (  «  a  ♦  i  )J  )  (  •  ) 

interest  because  the  random  process  (Z„)  might  possess  a  simple 

MMSE  one-step  predictor.  For  example,  suppose  that  ( X„ )  is  Note  that  (15)  is  valid  only  when  (Z„)  belongs  to  Class  1 
such  that  we  can  find  a  g(  )  for  which  (Z„)  is  Gaussian.  Wc  then  considered  in  Section  II.  This  implies  that  (15)  can  be  completely 
know  that  the  MMSE  one-step  predictor  on  {/„}  is  linear.  We  determined  because  the  coefficients  a,  correspond  directly  to 
wish  to  investigate  the  best  predictor  for  the  { Af„)  process,  knowledge  of  the  marginal  moments  of  the  white  driving  process. 


aisJif  u'Pk(  M  WM  > 

and  using  the  fact  that  g()  is  an  invertible  function,  wc  can 
rewrite  (14)  into  the  desired  predictor  for  { A"„): 

£{*..-  1 1  Ttfl  •  *  *  *  »■*«  -A  +  I  } 

=  2  /[*"'(*.),  •  .*  '(*.  A*.)])  0>) 


I 


654 


TRANSACTIONS  ON  INFORMATION  THEORY.  VOL.  IT-27,  NO.  5.  SEPTEMBER  1981 


Suppose  that  the  random  process  (Z„)  is  characterized  by  a 
suboptima]  MMSE  one-step  predictor  of  the  form 

£{Z..il*..  -  =  *('„.  * .  i ).  06) 

where  in  the  previous  example  we  assumed  the  optimal  predictor 
to  be  of  this  form.  In  this  case,  we  do  not  know  the  conditional 
density  corresponding  to  (10).  If  we  define  («„}  as  a  random 
process  denoting  the  error  at  each  sampling  instant  between  the 
optimal  estimate  and  the  suboptimal  estimate  and  let  P,{) 
denote  its  marginal  density  at  the  nth  sampling  instant,  we  can 
then  write  the  conditional  density  of  Z„,|t  given 
<Z„.  •  ,Z„  as 

</(  *>i  •  l!  '  '  ‘  -rii  *  •  I  ) 

=  ll*».  -*♦!.«.  4  1  I  W«...  I.  (17) 

where  r(  •  |  • )  denotes  the  conditional  density  of  Z„  , ,  given 
<  Z„,  -  •  -  ,Z„  t , ,)  and  Because  h(  )  is  an  MMSE  estimator, 
•  i)  =  0.  We  can  then  write  the  recursive  relation 

Z„.,=  h{Z„,  -,Z„  ,*,)  +  €„♦,+  (18) 

describing  the  process  (Z„),  where  (P„)  is  a  zero-mean  white 
driving  process.  Equation  (18)  is  then  the  suboptimal  analog  to 
( I ).  From  ( 1 8)  we  can  write 

»•(*%.  llv  l.t.Fl) 

=  Pp[:,,,- k(z„.  -,zn  ».i) (19) 

Substituting  (19)  into  (17),  we  then  have 
</(*,. |U„.  i) 

=  )“<».,  R..,  (««.l)  <*<,,  I 

Wc  compute  E[g(Zn,  k<  x)  as  before  and  write 

^  {  8  (  I  )l  *n » "  ‘  ‘  *?n-k  +  I  ) 

“/ -  tll) 

P.  ..,(*,,,  )</c,+  ,dz,tl.  (20) 

Employing  a  change  of  variable,  wc  put  (20)  into  the  form 

£{K(Z„,l)\z„%-  -,zn  *.,} 

z  Jk[p  +  ,  )]jpp(  p  -  ()P,  ,,,(<)  dtdp. 

(21) 

Again,  if  wc  assume  that  g(  ■)  can  be  expanded  into  a  Taylor 
series,  remembering  that  g(  • )  is  a  invertible  function,  and  upon 
defining 

Vi.,~  ;t  /  fp'PP(p-t  J/*, ..,(<)*/<  4>.  (22) 

wc  obtain 

£{*..iK- ■••••*.  A  - 1 } 

=  2  '(*»).•  .«  '(*„».  i )] }  (23) 

■  o 

If  wc  assume  that  the  marginal  error  density  is  independent  of  n. 


(23)  can  be  simplified  to 
6  { i|x„,-  ■  •  ,x„  ,} 

=  2  (24) 

i=0 

where  we  set  6„,  ,  ,  =  c,.  In  most  cases,  the  marginal  density 
£,..,(•)  will  be  difficult,  if  not  impossible,  to  obtain  For  this 
reason,  (23)  and  (24)  should  only  be  interpreted  as  providing  a 
functional  form  for  the  prediction  filter,  and  the  coefficients 
bn , ,  ,  and  c,  should  be  obtained  through  some  procedure  which 
minimizes  the  quantity 

e{(X,+  ,~  E(X,.x 

IV.  Examples 

A.  Class  I  Random  Process  Ik  —  I) 

Consider  the  random  process  characterized  by  the  nonlinear 
stochastic  difference  equation 

*,+.  =  /(*,) +  H,*,. 

where  {((/„}  are  i.i.d.  zero-mean  random  variables.  For  the  case 
where  k  -  1,  (7)  becomes 

Km' *'’(  *.  ..)=  Jr.  l  ,-/(  X.  )]/’«■>(  X.  )  dx, 

Wc  now  make  use  of  the  following  theorem  proved  in  the 
Appendix. 

Theorem:  If  the  random  process  (X„(  can  be  characterized  by 

*.♦.  =  /(*.)+  (4  m. 

where 

1)  /*„(•)  is  strictly  positive  and  uniformly  continuous  on  a 
finite  closed  support  Q„,  and 

2)  /:  ft  -  Qy  such  that  (v:  t>  =  u  +  /,  u  £  Q„,  /  6  0, ,  9 

and  /( - )  is  continuous  on  0., 

then  the  densities  £|"’(x.)  converge  to  a  steady  slate  haiiting 
density  P,[x„)  with  finite  closed  support  Q,.  Because  the  margi¬ 
nal  densities  possess  steady  state  limits,  the  random  process  I  ,V„ ) 
is  asymptotically  first-order  stationary.  Hence  if  (JT„)  satisfies 
the  conditions  of  this  theorem,  then  {X„)  br'ongs  to  CTass  I  with 


B.  Class  U  Random  Process 

We  consider  a  particular  example  of  Class  II.  Wc  assume  that 
f  Z„ }  is  a  zero-mean  Gaussian  random  process  and  that  we  have  a 
suboptimal  characterization  implying  that  either  (23)  or  (24) 
applies.  Consider 

*(**. •••.*„-*)  =  c*-V  (25) 

Applying  the  orthogonality  principle  to  obtain  the  coefficient  i*. 
we  find  ihat  the  suboptimal  one-step  predictor  is  given  by 

h{Z„  -  ,Z..*)  =  pZ„. 

where 

E{Z ,4|Z.) 

£{z.2} 

Therefore,  the  random  variables  (Z„, ,  -  pZ„)  are  uncoticlatcd 
with  the  random  variable  Z„  at  each  sampling  instant  For  this 


IEEE  TRANSACTIONS  ON  INFORMATION  THEORY,  VOL.  11-27,  NO.  5,  SEPTEMBER  1981 


655 


example  (18)  becomes 

Z„,i  =  pZ«+ |  (26) 

where  we  require  P„  S(0,  o2 )  Consequently,  t„  ~  N(0,  o,\), 
and  at  a  fixed  sampling  instant  Z„  and  («„,,+■  /*„,,)  arc  inde¬ 
pendent.  Since  <„ , ,  can  only  depend  on  {P,,  i  s  n),  P„ , ,  and 
,  are  independent  as  a  result  of  the  whiteness  of  (P„). 
Because  {Z*}  is  stationary,  the  prediction  error  variance  must  be 
independent  of  it,  and  since  a2  is  a  constant,  o2  =  a 2  must  also 
be  a  constant. 

Recall  from  (22)  that 

c,=  y;f{p‘P,(p-<)Pt(<)d<dp  (2?) 

If  we  rewrite  (27)  into 

//>'[/**,(/>  -  <)P((<)dcj  dp, 

we  notice  that  the  integral  within  the  brackets  corresponds  to  the 
density  of  the  sum,  P,¥  1  +  «„+ Hence 


fpp(p-<)P,(')d<  = 

and  (27)  becomes 
1  t  . 


/2*  1 /«;  +  0. 


,[V/2(o;  +  nJ)]. 


c,=  7t/p  ‘ 


r2n  yo2  +  o,J 


The  moments  of  a  zero-mean  normal  random  variable  are  (6) 

C|=.  7?l'-3  ('-•)](«; +  <’.I)'/2-  *even  (28) 

0,  todd. 

Because  Z„  and  (c„ , ,  +  t ,)  are  independent,  we  can  use  (26) 
to  obtain  the  expression 

a2  =  pV  +  (o2  +  of  ),  (29) 

where  we  have  used  the  fact  that  (Z„)  is  stationary.  Substituting 
(29)  into  (28),  we  obtain  for  the  predictor  coefficients 


Appendix 
Proof  of  Theorem 

Theorem:  If  the  random  process  { *„}  can  be  characterized  by 
=  /(*„)  + 

where 

1)  PJ  )  is  strictly  positive  and  uniformly  continuous  on  a 
finite  closed  support  Q„  and 

2)  /:  Q  -•  Qy  such  that  {v:  v  =  u  +  /,  v  6  Q„,  /€E  12y}  C  12, 
and  /(■)  is  continuous  on  Qu,  then  the  densities  P*" ‘(x„) 
converge  to  a  steady  state  limiting  density  Pjx„)  with  finite 
closed  support  Q,. 

Proof:  We  need  to  show 

1)  Q,  bounded  and  closed  and 

2)  PJ)  is  strictly  positive  and  regular 

at  which  point  we  can  then  apply  the  results  due  to  Feller  15). 

1)0,  Bounded  and  Closed:  Consider  first  fi„  =  (support  of 
Pj’^-)).  We  show  by  induction  that  12 „  is  bounded  and  closed 
for  all  n.  For  n  =  0,/:  Q„  -»  Oj,  and  because  12  „  is  bounded  and 
closed  and  /(■)  is  continuous  on  0„  then  Oj  is  bounded  and 
closed.  Now, 

tt,=  (i)-.#  =  u+/,«6fl„/e8/}. 

To  prove  Q,  is  bounded,  first  assume  that  12,  is  not  bounded; 
then  there  exists  (f)  £  Oj  such  that  ft  —  00  and  (u,)  e  12  „  such 
that  uf  —  00  or  both  so  that  uy  =  (uy  +  f)  -*  00.  But  Qyand  12„ 
ire  bounded  Hence  Q,  is  bounded. 

To  prove  12,  is  closed,  for  any  f0  e  12y  there  exists  (f)  e  12 j, 
such  that  (/’}  -  j0,  and  for  any  u0  6  0„  there  exists  (u;)  e  12„ 
such  that  Uj  —  Uq  Form  the  sequence  vt  —  u(  +  fj,  v  —  vu  ~  /„ 
+  u0.  But  t>0  6  0,,  for  all  such  sequences  in  (2,  Hence  Q,  is 
closed. 

Assume  Q„  is  bounded  and  closed.  By  the  argument  above. 
Q„+ 1  is  bounded  and  closed.  Hence  Q„  is  bounded  and  closed  for 
all  n. 

From  condition  (2),  we  know  that  /:  0.  -*  Oj  such  that  (c: 
v  =  u  +  f,  u  6  12 „,/e  fly)  C  0„,  and  that  Q„+l  =  (o:  c  =  u  ■+ 
;,«el2„./el2y)  Hence  Q„+l  £  Q„  cC.-t  C  ■■■  C  0.  C  0.- 
The  support  of  PJ  ■ )  is  given  by 


U[>3  (.-I)](I-P2)"V 

reven  {yQ) 

i2,= q.  n  f  n  q. 

10, 

i  odd. 

V  ft  =  l  / 

Finally,  by  substitution  of  (25)  into  (24)  and  using  the  nonlinear¬ 
ity  #( ;  )  --  r5,  we  obtain  for  the  form  of  the  predictor 

/'I  V„  .Jr,,}  c,,p'*„  (  5c,p,x*/i  +  20tipl>,!' 

+  60i ,p2x^/4+  I20c4pxy'+  120  c-,. 

From  (30)  we  obtain  the  coefficient  values 

<0=  I. 

<:  =  )(•-  P2)®.-2- 

f4  =  H*  -  p2)V. 

‘1  -  f|=  f»  =  0. 

We  thus  obtain  for  the  MMSE  one-step  predictor 

/.!  -  I5p(  I  -  p1)1o*x'„fi 

+  I0p3(l  -  p’  )o:x'„  '  7  p\x„ 


Since  12„  is  bounded  and  closed  for  all  n  and  12 u  is  bounded  and 
closed  by  assumption,  then  12,  is  bounded  and  closed.  Also,  since 
0,  i  !2„  and  PJ  )  >  0  and  uniformly  cont...uous  on  12„.  then 
PJ  •)  >  0  and  uniformly  continuous  on  12,. 

2)  The  Kernel  (PJ  ■  ))  Is  Strictly  Positive  and  Regular:  We  base 
already  shown  PJ  )  >0  and  uniformly  continuous  on  12, 

Definition  (Feller):  The  kernel  is  regular  if  the  family  of 
transforms  P',m\  ■ )  are  equicontinuous  whenever  P“(  )  is  uni¬ 
formly  continuous  in  12,. 

Wc  note  that  P^'\  )  —  PJ  )  Hence  P^'  is  uniformly  continu¬ 
ous  on  12,.  We  have  that 

«••(♦)  =  / 

Ja 

Look  at  the  expression  of  <t>  -  f(:).  We  have  that  <>  C-  12,..  which 
is  bounded  and  closed,  and  /:  12„  _  ^  —  0j,  which  is  bounded  and 
dosed.  Define 

0,  =  [p.p  =  12, ,./el2;  }. 

By  the  same  argument  wc  used  to  show  12„  is  bounded  and  closed 
when  12„  |  is  bounded  and  closed,  wc  can  stale  12r_  is  bounded 


656 


IEEE  TRANSACTIONS  ON  INFORMATION  THEORY,  VOL.  IT-27,  NO.  5.  SEPTEMBER  1961 


and  closed.  Since  ft,a  is  compact  and  /*„(  •)  is  uniformly  continu¬ 
ous,  then  there  exist  a„#G  Q,a  such  that  PJ.oHf)  =  supl6Q  P„(x). 
Define  M  =  max,{|/>„(a„o)|V  Then  because 

« /  IM*- A*)ll-lfJ""“(*)l 

■'0,-1 

we  have 

|/»i"'(  )|<A/f  \P?-'\i)\dz  =  Mt  for  all  n . 

■'o.-i 

Recalling  that 

Ja.-, 

we  can  immediately  write 

i/ruo  -  /,r>(«")i  s /  \pm'  -/(*)]  -  r.w  -/(on 

•'0.-1 

■\P{,'-"{z)\dz. 

Define  W  -  max„/0_<fz  <  oo.  Pick  an  arbitrary  r0eQ,_,. 
Give  t  >  0.  Let  6  =  t\t)  >  0  such  that  !♦'  -  $"|  £  4.  Then 
I  fU*'  "/(-’oil  ~  PM"  ~/(*o)l|  £  t/MW,  because  Py(  )  is 
uniformly  continuous  on  Q„  for  all  n.  Hence  |  Pj"\^')  ~ 
C<«ni  <  {t/MW)  MW  =  t,  for  all  n  and  /*<->(-);  therefore, 
P{t"\  )  are  equicontinuous  and  the  kernel  /’„(•)  is  regular.  We 
now  appeal  to  the  following  theorems: 

Theorem  3  [Fetter):  Every  strictly  positive  regular  kernel  on  a 
bounded  closed  interval  is  ergodic; 

Theorem  4  [  Fetter):  A  strictly  positive  regular  kernel  is  ergodic 
if  and  only  if  it  possesses  a  strictly  positive  stationary  probability 
distribution;  where  P,(x)  has  support  Q,  which  is  bounded  and 

closed. 


References 

|I1  T  E  McCannon,  N.  C.  Gallagher,  G.  L.  Wiie,  and  D.  Minoo-  Hamedani, 
"A  novel  approach  for  designing  nonlinear  discrete  time  filters:  Part  II," 
in  Proc  loth  Ann.  A llerton  Conf  Communication,  Control  and  Computing, 
Oct  4-6.  1 97*. 

|2|  O.  L  Wise  and  N.  C.  Gallagher.  "On  the  determination  of  regression 
functions,"  in  Proc  17th  Ann.  A  Utrton  Conf.  Communication,  Control  and 
Computing,  Oct.  10-12,  1979. 

(31  A  V  Balaknshnan.  "On  a  characterization  of  processes  for  which  optimal 
mean-square  systems  are  of  specified  form,”  IRE  Trans.  Inform  Theory. 
vol  IT-6,  pp  490-500.  Sept  I960 

|4|  II  E.  llenrv  and  P.  M  Schultheiss,  “The  analysis  of  certain  nonlinear 
feedback  systems  with  random  inputs,"  IRE  Trans  Inform  Theory,  vol. 
IT  S,  pp  25-29.  July  1962. 

|5|  W  Feller.  An  Introduction  to  Probability  Theory  and  Its  Applications 
New  York  Wiley.  1965.  pp  270-272. 

|6|  A  Papoulis.  Probability,  Random  Variables  and  Stochastic  Processes  New 
York  McGraw-Hill,  i965.  pp  147-14* 

( 7|  H  L  Rnvden,  Prat  Analysts  New  York:  Macmillian.  196*. 


11)6 


IEEE  TRANSACTIONS  ON  ACOUSTICS,  SPEECH,  AND  SIGNAL  PROCESSING,  VOL.  ASSP-29,  NO.  6,  DECEMBER  1981 


A  Theoretical  Analysis  of  the  Properties  of  Median 

Filters 

NEAL  C.  GALLAGHER,  JR.,  member,  IEEE,  and  GARY  L.  WISE,  member,  ieee 


A  termer -Necessity  and  wffictent  condition!  tor  a  signal  to  be  in¬ 
variant  under  a  specific  form  of  median  altering  am  derived.  These 
conditions  state  that  a  signal  must  be  locally  monotone  to  pass  through 
a  median  filter  unchanged.  It  is  proven  that  the  form  of  successive  me¬ 
dian  filtering  of  a  signal  (i*.,  the  filtered  output  is  itself  again  filtered) 
eventually  reduces  the  original  signal  to  an  invariant  signal  called  a  root 
signal.  For  a  signal  of  length  L  samples,  a  maximum  of  \(L  -  2)  re¬ 
peated  filterings  produces  a  root  signal. 

Manuscript  received  December  7,  1979',  tevised  April  16, 1981.  This 
work  was  supported  by  the  Air  Force  Office  of  Scientific  Research 
under  Grants  AFOSR  78-3605,  AFOSR  76-3602,  and  AFOSR  81-0047. 

N.  C.  Gallagher,  Jr.  is  with  the  School  of  Electrical  Engineering,  Pur¬ 
due  University,  W.  Lafayette,  IN  47907. 

G.  L.  Wise  is  with  the  Department  of  Electrical  Engineering,  Univer¬ 
sity  of  Texas,  Austin,  TX  78712. 


I.  Introduction 

IN  many  signal  processing  applications,  a  method  called  me¬ 
dian  filtering  has  achieved  some  very  interesting  results. 
One  useful  characteristic  of  median  filtering  is  its  ability  to 
preserve  signal  edges  while  filtering  out  impulses.  Promising 
applications  of  median  filtering  are  picture  processing  and 
speech  processing  [l]-[3] .  The  implementation  of  a  median 
filter  requires  a  very  simple  digital  nonlinear  operation.  To 
begin,  we  take  a  sampled  and  quantized  signal  of  length  L ; 
across  this  signal  we  slide  a  window  that  spans  IN  +  1  signal 
sample  points.  The  filter  output  is  set  equal  to  the  median 
value  of  these  IN  *  1  signal  samples,  and  is  associated  with 
the  time  sample  at  the  center  of  the  window. 


GALLAGHER  AND  WISE  PROPERTIES  OK  MEDIAN  KILTERS 


1137 


Fig.  1.  Signal  filtered  by  three  different  median  filters:  (a)  N-  1,  (b) 
Nm2,  and  (c)  JV  «  3. 


In  one  form  of  median  filtering,  to  account  for  startup  and 
end  effects  at  the  two  endpoints  of  the  /.-length  signal,  N  sam¬ 
ples  are  appended  to  the  beginning  and  the  end  of  the  se¬ 
quence.  The  appended  samples  are  constant  and  equal  in  value 
to  the  first  and  last  samples  of  the  original  sequence,  respec¬ 
tively.  For  other  ways  of  treating  the  start-up  problem  that 
gives  less  emphasis  to  the  first  and  last  values  encountered, 
see  K  p.  221]. 

As  an  example,  consider  the  binary  valued  sequence  of 
Fig.  1(a)  where  L  3  10  and  IV*  1;  the  median  filtered  signal 
is  plotted  below  the  extended  input  signal.  The  appended 
values  are  marked  as  X’t.  Fig.  1(b)  illustrates  the  filtering  of 
the  same  input  signal  as  for  Fig.  1(a),  but  we  set  N  *  2;  we  set 
N  x  3  for  the  example  in  Fig.  1(c).  The  signal  of  Fig.  1  passes 
undisturbed  through  the<V“  1  filter;  however,  it  is  affected  by 
the  N  *  2  and  N  *  3  filters.  The  signal  would  be  reduced  to  a 
constant  value  by  an  N  *  4  filter. 

The  results  illustrated  in  Fig.  1  suggest  the  concept  of  a  filter 
“passband"  and  “stopband.”  The  given  signal  is  in  the  pass- 
band  of  the  N  *  l  filter  and  the  stopband  of  the  N  -  4  filter, 
if  we  view  the  median  filter  as  one  that  passes  edges  but  not 
impulses,  then  edges  for  an  Nm  1  filter  may  be  impulses  for 
an  N  =  4  filter.  But  what  about  the  N  *  2  and  N  -  3  filters? 
Suppose  the  signal  of  Fig.  1  is  filtered  twice  in  succession  by 
the  /V  =  2  filter,  in  other  words,  the  filtered  output  is  again 
filtered.  The  result  in  this  specific  instance  is  a  constant  out¬ 
put  identical  to  that  obtained  by  a  single  filtering  with  an 


N=  4  filter.  If  the  constant  is  filtered  again,  the  output  is  the 
same  as  the  filter  input;  the  constant  is  invariant  to  median 
filtering.  So,  by  filtering  this  particular  original  signal  two 
times  with  an;V=2orvV=3  filter,  we  have  a  resulting  signal 
that  is  invariant  to  successive  filterings,  the  same  result  ob¬ 
tained  by  a  single  pass  with  the  N  -  4  filter.  Note  that  the  sig¬ 
nal  input  signal  of  Fig.  1  is  invariant  to  repeated  filtering  with 
an  N  »  1  filter.  We  call  such  a  signal  a  root  of  the  median  fil¬ 
ter.  We  see  that  signals  which  do  not  reside  entirely  within  the 
filter  “passband”  can  be  reduced  to  their  passband  component 
by  repeated  filterings. 

In  this  paper,  we  will  formalize  the  concepts  of  filter  pass- 
band  and  stopband.  We  described  desirable  signal  character¬ 
istics  for  signals  employed  in  median  filtering,  and  show  how 
some  types  of  noise  can  be  completely  removed  by  median 
filtering  and  how  other  types  cannot  be  removed.  These  re¬ 
sults  trill  be  presented  through  the  development  of  a  formal 
theory  of  median  filtering.  In  Section  11  we  present  some 
basic  definitions  that  allow  us  to  precisely  state  and  prove  a 
number  of  interesting  results.  The  reader  concerned  only  with 
results  may  wish  to  proceed  to  Section  III. 

II.  Theory  for  Median  Filtering 

In  order  to  give  a  precise  statement  for  the  theorems  pie- 
sented  later  in  Ihe  section,  a  number  of  definitions  are  neces¬ 
sary.  We  will  always  be  working  with  a  sample  length  l.  where 
each  sample  is  quantized  to  one  of  K  different  values.  The  fil- 


1138 


IEEE  1  KANSACTIONS  ON  ACOUSTICS.  SPEECH,  AND  SIGNAL  PROCESSING,  VOL.  ASSP-29,  NO.  6,  DECEMBER  1981 


ter  window  length  is  the  number  of  consecutive  samples  con¬ 
sidered  when  computing  the  running  median.  We  will  always 
take  the  window  length  to  be  an  odd  integer  (2N  +  I )  for  TV  = 
0,  1 ,  2,  -  -  - .  As  noted  earlier,  our  convention  is  that  the  filter 
output  at  position  L  is  the  median  value  obtained  when  posi¬ 
tion  L  is  in  the  center  of  the  window.  We  define  the  following 
signal  characteristics. 

1)  A  constant  neighborhood  is  at  least  N  +  1  consecutive 
identically  valued  points  such  that  the  constant  neighborhoods 
and  edge  together  are  monotone. 

2)  An  edge  is  a  monotonic  region  between  two  constant 
neighborhoods  of  different  value.  The  connecting  monotonic 
region  cannot  contain  any  constant  neighborhood. 

3)  An  impulse  is  a  constant  neighborhood  followed  by  at 
least  one,  but  no  more  than  N  points  which  are  then  followed 
by  another  constant  neighborhood  having  the  same  value  as 
the  first  constant  neighborhood.  The  two  boundary  points  of 
these  at  most  IV  points  do  not  have  the  same  value  as  the  two 
constant  neighborhoods. 

4)  An  oscillation  is  a  sequence  of  points  which  is  not  part  of 
a  constant  neighborhood,  an  edge,  or  an  impulse. 

Of  particular  interest  is  the  class  of  signals  that  can  pass 
through  the  filter  unchanged,  as  well  as  the  class  of  signals  that 
are  completely  removed  by  filtering.  Assume  that  an  /.-length 
signal  is  filtered  with  a  IN*  1  window.  As  noted  previously, 
we  always  append  to  the  beginning  of  the  signal  an  additional 
N  constants  equal  in  value  to  the  first  sample  of  the  signal. 
Similarly,  N  constant  points  are  appended  to  the  end  of  the 
/.-length  signal.  By  doing  this,  we  assure  that  when  the  initial 
signal’s  first  or  last  sample  is  in  the  center  of  the  window,  the 
median  filter  output  equals  this  sample  value.  For  a  signal  to 
pass  through  a  median  filter  unchanged  means  that  the  central 
sample  value  for  each  window  position  is  itself  the  median  of 
the  samples  within  the  window. 

Consider  a  signal  that  is  unchanged  by  median  filtering.  As¬ 
sume  that  the  window  increments  from  sample  to  sample  mov¬ 
ing  from  left  to  right  across  the  signal  and  that  the  window  is 
now  centered  at  the  second  signal  sample  of  the  original  signal. 
We  know  that  the  N  points  to  the  left  of  center  have  the  same 
constant  value.  If  they  equal  the  value  of  the  center  point, 
then  it  (the  center  point)  must  be  the  median.  If  they  are  less 
than  the  value  of  the  center  point,  then  the  N  points  to  the 
right  of  center  must  be  all  greater  than  or  equal  to  the  central 
value.  If  the  N  points  to  the  left  are  greater  in  value  than  the 
central  point,  then  the  N  points  to  the  right  are  all  less  than  or 
equal  to  the  center  value.  Thus,  note  that  the  leftmost  /V  +  2 
points  in  the  window  form  a  monotone  sequence  of  points. 
Increment  the  window  another  sample  to  the  right,  so  that  the 
window  is  now  centered  at  the  third  signal  sample.  The  left¬ 
most  /V  9  1  samples  in  the  window  form  a  monotone  sequence. 
Assume  that  the  N  leftmost  points  in  the  window  are  not 
greater  than  (respectively,  not  less  than)  the  center  point. 
Then,  since  the  center  point  is  the  median  value  of  the  points 
in  the  window,  the  N  rightmost  points  in  the  window  must  be 
not  less  than  (respectively,  not  greater  than)  the  center  point. 
Thus,  we  see  once  again  that  the  leftmost  N  *  2  points  in  the 
window  form  a  monotone  sequence.  Increment  the  window 
another  sample  to  the  right.  By  applying  the  same  argument 


as  before,  we  again  find  that  the  /V  +  2  leftmost  points  in  the 
window  form  a  monotone  sequence.  Indeed,  a  straightforward 
inductive  argument  proves  that  the  leftmost  N+2  points  in 
the  window  foim  a  monotone  sequence  regardless  of  the  win¬ 
dow  position.  Recalling  that  the  extended  signal  has  N  con¬ 
stant  points  appended  to  the  right  of  the  original  signal,  we 
see  that  the  extended  signal  is  such  that  any  consecutive  N  +  2 
points  must  be  monotone.  Thus,  a  signal  invariant  to  median 
filtering  must  be  such  that  the  extended  signal  contain  only 
constant  neighborhoods  and  edges. 

Now  assume  that  the  extended  signal  contains  only  constant 
neighborhoods  and  edges.  If  the  center  of  the  window  is  at 
any  signal  sample,  then  the  points  in  the  window  are  either 
monotone  or  nonmonotone.  If  the  points  are  monotone,  then 
the  signal  sample  at  the  center  of  the  window  is  not  changed 
by  the  median  filter.  If  they  are  nonmonotone,  then  the  win¬ 
dow  must  be  centered  on  a  point  in  the  constant  neighbor¬ 
hood  shared  by  two  edges.  Of  the  2N  *  1  points  in  the  win¬ 
dow,  at  least  N  +  1  of  them  are  equal  to  the  center  point,  and 
thus  the  center  point  is  unchanged  by  median  filtering. 

These  observations  are  formalized  in  the  following  theorem. 

Theorem  1- Given  a  length-/.,  /IT-valued  sequence  to  be  me¬ 
dian  filtered  with  a  2 N  +  1  window,  a  necessary  and  sufficient 
condition  for  the  signal  to  be  invariant  under  median  filtering 
is  that  the  extended  signal  consist  only  of  constant  neighbor¬ 
hoods  and  edges.1 

The  following  corollary  is  a  direct  result  of  this  theorem. 

Corollary- For  a  median-filter-invariant  signal  to  contain 
both  regions  of  increase  and  decrease,  the  points  of  increase 
and  decrease  must  be  separated  by  a  constant  neighborhood 
(at  least  N  +  1  consecutive  identical  points). 

As  a  result  of  this  theorem,  it  is  possible  to  construct  signals 
that  are  invariant  to  median  filtering.  Also,  given  the  space  of 
all  length-/.,  A-valued  signals  S,  it  is  possible  to  identify  all 
those  signals  invariant  to  median  filtering  with  s2JV+l  win¬ 
dow.  We  will  call  these  signals  the  roots  of  the  filter,  and  this 
set  of  signals  is  denoted  as  RN.  Note  that  RNCS  for  any  N, 
and  that  we  have  the  following  lemma. 

Lemma  1:  For  an  /.-length,  A-valued  set  of  signals  S ,  the 
root  sets  R n  are  nested  such  that 

"  '  &N*  i  c  Rn  C  •  *  *  C  R0  -  S. 

Proof:  If  a  signal  is  invariant  to  a  filter  of  window  length 
2(N  +  1)  +  1,  then  each  neighborhood  of  AT +3  samples  is 
monotone.  Consequently,  each  neighborhood  of  length  N  *  2 
is  monotone  and  the  signal  is  invariant  to  a  filter  window  of 
length  2N  +  1 ;  i.e.,  Rn*\  c  Rn-  It  is  trivial  to  verify  that  a 
window  of  length  1  reproduces  any  signal  exactly  upon  filter¬ 
ing  because  the  median  value  of  a  set  containing  just  one  point 
is  the  value  of  that  point;  thus,  Ro  *  S. 

We  have  established  that,  for  a  given  filter  window  2 N  ♦  1 
and  a  signal  set  S,  there  exists  a  root  set  Rn  of  signals  invariant 
to  filtering.  For  a  given  /.-length  signal  s,  we  represent  the 
median-filtered  version  of  s  by  fN{s)  for  a  2/V  +  1  size  window. 
We  represent  by  f^(s)  the  twice  filtered  signal 

1  It  has  recently  come  to  our  attention  that  S.  Tyan  has  proven  a  ver¬ 
sion  of  this  theorem  in  an  unpublished  manuscript.  We  have  not  seen  a 
copy  of  this  manuscript  and  can  only  speculate  as  to  its  contents. 


GALLAGHER  AND  WISE:  PROPERTIES  OE  MEDIAN  KILTERS 

/jV  V)  =  //V  UW*)I  • 

We  define  fs\s)  as  the  n-times  filtered  signal 

ftPis)  ~  //v  1/at"  "  ■ 

If  s  -  fN(s),  then  i  is  a  root  of  the  filter.  We  next  prove  that 
for  any  signal  s  there  exists  an  n  such  that /n\s)  =  r  where  r 
is  a  root. 

Suppose  that  we  are  given  an  /.-length  signal  s  that  is  not  a 
root.  Recall  that  N  constant  points  are  appended  to  the  begin¬ 
ning  of  the  signal.  By  construction,  the  first  original  signal 
point  is  the  median  of  the  interval  for  which  it  is  the  central 
point.  As  we  slide  the  window  from  left  to  right  across  the  sig¬ 
nal,  the  first  point  to  move  (i.e.,  where  the  window’s  central 
point  is  not  the  median)  must,  by  definition,  be  either  a  point 
contained  in  an  impulse  or  oscillation.  Suppose  that  it  is  an 
impulse.  By  construction,  an  impulse  has  two  constant  neigh¬ 
borhoods  of  equal  value  on  either  side,  and  every  point  in  the 
impulse  is  filtered  to  this  constant  value  by  one  pass  of  the 
filter  window.  Suppose  that  the  first  point  to  be  moved  is 
contained  in  an  oscillation.  Let  p  be  the  location  of  the  last 
point  unaffected  by  the  median  filter,  and  assume  that  the 
filter  is  centered  at  this  point.  Then  the  leftmost  N  +  2  points 
must  be  monotone  as  seen  in  the  proof  of  theorem  1 .  Assume 
without  loss  of  generality  that  they  are  monotone  nondecreas¬ 
ing.  Assume  that  the  window  is  now  centered  at  the  point 
p  +  1 .  By  hypothesis,  this  point  must  change  in  value.  Recall 
that  the  leftmost  JV  points  are  not  greater  in  value  than  the 
center  point.  If  the  N  rightmost  points  were  greater  than  or 
equal  to  the  center  value,  then  this  value  at  p  +  1  would  be  the 
median.  Thus,  at  least  one  point  to  the  right  of  center  must 
have  a  value  less  than  that  at  p  +  1 .  Thus,  there  are  N  ♦  1 
points  in  the  window  not  greater  in  value  than  the  center 
point,  and  the  center  point  changes.  Therefore,  it  changes 
downward  in  value.  Note  that  it  can  never  achieve  a  value  less 
than  the  value  of  the  immediately  preceding  constant  neigh¬ 
borhood  because  there  are  always  at  least  N  +  1  points  con¬ 
tained  in  the  window,  including  that  at  p  ♦  1  itself,  whose  val¬ 
ues  arc  all  greater  than  or  equal  to  the  constant  neighborhood. 

So  we  see  that  the  first  point  that  changes  under  filtering  is 
preceded  by,  but  not  necessarily  adjacent  to,  an  invariant  con¬ 
stant  neighborhood,  and  the  point  is  contained  either  in  an 
impulse  or  oscillation.  We  also  see  that  upon  filtering,  the 
value  of  this  point  moves  closer  to  the  value  of  the  constant 
neighborhood.  There  are  two  possibilities:  the  value  of  point 
p  equals  the  value  of  point  p  +  1 ,  or  the  value  of  point  p  +  1  is 
greater  than  that  at  p.  In  addition,  it  can  be  shown  that  the 
value  of  point  p  +  1  is  greater  than  the  value  of  point  p.  Sup¬ 
pose  that  the  two  points  have  the  same  value.  As  the  window 
increments  from  position  p  to  p  +  1,  one  point  moves  out  of 
the  window  on  the  left  side  and  another  point  moves  into  the 
window  on  the  right.  The  point  that  moves  out  on  the  left  has 
a  value  less  than  or  equal  to  that  of  point  p  t  1.  Because  we 
know  that  the  filtered  value  at  p  ♦  1  is  less  than  the  original 
value,  the  point  that  moves  in  on  the  right  side  must  also  have 
a  value  less  than  that  at  p  +  1 ;  otherwise,  the  value  at  p  +  I 
cannot  decrease.  If  the  value  of  point  p  ♦  1  is  the  same  as  that 
of  p,  then  there  remain  N  points  in  the  window  less  than  or 


1139 

equal  to  the  value  at  p  +  1  (and  at  p)  and  there  also  remain  .V 
points  in  the  window  greater  than  or  equal  to  the  value  at 
p  +  1 ;  consequently,  point  p  ♦  1  is  the  median  and  would  not 
change.  Thus,  the  value  of  the  first  point  to  change  must  be 
greater  than  its  predecessor. 

Recall  what  is  known  concerning  the  last  consecutive  point 
p  that  is  invariant  to  filtering.  The  N  points  in  the  window  to 
the  left  of  the  center  point  p  are  all  less  than  or  equal  to  p  m 
value;  the  N  points  to  the  right  of  p  are  all  greater  than  or 
equal  to  p  in  value.  When  the  next  point,  p  +  1 ,  is  e  ntered  in 
the  window,  there  will  be  at  least  N  points  less  than  or  equal 
to  p  in  value  and  at  least  N  +  1  points  greater  than  or  equal  to 
p  in  value.  Therefore,  the  median  value  cannot  be  less  than 
the  value  of  p.  For  convenience  we  summarize  this  as  the 
following. 

Observation  1:  The  value  of  the  first  point  to  change  value 
during  a  median-filtering  operation  must  be  on  the  opposite 
side  of  its  predecessor  than  the  most  recent  constant  neighbor¬ 
hood,  and  the  value  of  this  point  upon  filtering  moves  toward 
the  value  of  its  predecessor,  but  does  not  move  past  this  value. 

Continuing  in  this  fashion,  consider  the  point  p  +  2,  which 
follows  point  p  +  1.  Note  that  the  value  at  p  +  2  is  greater 
than  or  equal  to  the  value  at  p.  As  the  window  is  incremented 
to  the  right,  p  +  2  is  centered  in  the  window  and  a  point  moves 
out  of  the  window  on  the  left.  A  new  point  enters  the  window 
on  the  right.  The  value  of  this  point  must  be  either  greater 
than  that  at  p  or  less  than  or  equal  to  the  value  at  p.  If  it  is  less 
than  or  equal  to  the  value  of  p,  then  there  are  at  least  .V  -  1 
points  in  the  window  with  values  less  than  or  equal  to  that  at 
p  and  at  least  N  ♦  1  points  with  values  greater  than  or  equal 
to  that  at  p.  Consequently,  p  +  2  cannot  be  filtered  to  a  value 
less  than  that  at  p.  If  the  value  of  the  new  point  is  greater 
than  that  at  p,  then,  trivially,  the  filtered  value  at  p  +  2  cannot 
be  less  than  that  at  p.  The  same  reasoning  can  be  applied  to 
points  p  +  3,  p  +  4,  •  •  •  ,p  +  N.  For  convenience,  we  summa¬ 
rize  this  as  the  following. 

Observation  2:  After  filtering,  the  N  rightmost  points  in  the 
window  centered  at  p  must  all  have  values  equal  to  that  at  p  or 
on  the  opposite  side  of  the  value  at  p  than  the  most  recent 
constant  neighborhood. 

Consequently,  the  value  at  p  is  always  invariant  to  median 
filtering,  and,  in  addition,  the  same  argument  applies  to  any 
other  (invariant)  point  to  the  left  of  p.  Also,  the  poinl  p  +  1 
has  one  of  two  possible  filtered  values,  as  follows. 

Observation  3.  Of  all  the  values  in  the  window  centered  at 
p  +  1,  the  filtered  value  at  p  +  1  is  either  the  value  at  p  or  the 
closest  value  to  the  value  at  p  on  the  opposite  side  lrom  the 
most  recent  constant  neighborhood. 

By  using  an  argument  similar  to  that  just  presented,  we  rea 
son  that  the  filtered  values  at  p  ♦  2  -*  p  +  N  are  greater  than  or 
equal  to  the  filtered  value  at  p+  I.  If  the  filtered  value  at  p  +  1 
is  the  same  as  the  value  at  p,  then  point  p  ♦  1  is  invariant  to  lit¬ 
tering  on  the  next  pass  of  the  window  because  it  is  not  greater 
than  the  value  at  p.  Suppose,  however,  that  the  liiteied  value 
at  point  p  +  1  is  greater  than  that  at  p.  We  must  reexamine  the 
preftltered  point  values.  When  p  ♦  1  is  in  the  window  center , 
the  N  +  1  rightmost  points  must  all  have  values  greaier  than 
that  at  p  including  the  rightmost  point  p  +  ,V  +  1 .  Asa  result. 


1140 


IEEE  TRANSACTIONS  ON  ACOUSTICS,  SPEECH,  AND  SIGNAL  PROCESSING,  VOL.  ASSP-29,  NO.  6,  DECEMBER  1981 


• 

a 

• 

• 

X 

X 

•  ■ 

• 

• 

• 

Original  Signal 

• 

• 

• 

• 

X 

X 

• 

• 

• 

• 

After  One  Filter  Pass 

• 

• 

• 

• 

X 

X 

• 

• 

• 

• 

After  Two  Passes 

• 

• 

• 

• 

X 

X 

• 

• 

• 

• 

After  Three  Passes 

A  Moot 

Fig.  2.  Result  of  repeated  median  filtering. 


Fig.  3.  Partition  of  the  signal  space  S  by  eight  roots. 


when  p+  N  +  1  is  in  the  window  center,  the  leftmost  N  +  1 
points  have  values  greater  than  that  at  p  and  the  filtered  value 
at  p  +  N  +  1  must  be  greater  than  that  of  p.  Consequently,  on 
the  second  pass  of  the  window,  after  all  the  points  have  been 
filtered  once,  when  point  p  +  1  is  in  the  window  center,  the  N 
leftmost  points  are  all  less  in  value  than  that  at  p  +  1 ,  and  the 
rightmost  N  points  all  have  values  greater  than  or  equal  to  that 
at  p  +  1.  Thus,  p  +  1  is  the  median  of  the  window  and  does 
not  change  value  upon  the  second  filtering.  This  yields  the 
following. 

Observation  4:  The  first  point  to  change  value  on  a  median- 
filtering  operation  remains  invariant  upon  additional  filter 
passes. 

When  the  observation  is  made  that  the  median-filtering  op¬ 
eration  is  independent  of  whether  the  window  moves  from 
right  to  left  or  left  to  right  across  the  signal,  we  see  that  the 
properties  of  the  first  point  to  change  value  apply  also  to  the 
last  point  in  the  signal  to  change  value.  Because  of  the  ap¬ 
pended  constant  valued  points  to  the  front  and  back  of  the 
/.-length  signal,  the  first  and  last  signal  points  are  invariant  to 
filtering.  Thus,  at  most,  %{L  -  2)  window  passes  are  required 
to  reduce  the  signal  to  a  root.  As  a  result  of  the  previous  dis¬ 
cussion,  we  have  the  following  theorem  for  an  /.-length  signal. 

Theorem  2 -Upon  successive  median-filter  window  passes, 
any  nonroot  signal  will  become  a  root  after  a  maximum  of 
\(L  -  2)  successive  filterings.  Also,  any  nonroot  signal  cannot 
repeat,  and  the  first  point  to  change  value  on  any  pass  of  the 
filter  window  will  remain  constant  upon  successive  window 
passes. 

To  illustrate  this  characteristic  of  median  filtering,  consider 
the  binary  valued  L  B  8  signal  of  Fig.  2.  This  signal  will  be  re¬ 
peatedly  filtered  by  use  of  a  window  length  of  3  samples.  The 
appended  constant  terms  are  marked  with  x’s.  We  see  that 
j(/.  -  2)  *  3  window  passes  are  required  to  reduce  this  signal 
to  a  root. 

To  this  point,  it  has  always  been  assumed  that  the  signal  is 
quantized  to  K  levels  for  an  /.-length  signal.  This  requirement 


is  not  needed  because  an  /.-length  signal  can  have,  at  most,  L 
different  values  even  if  the  signal  samples  are  not  quantized  to 
specific  values.  Thus,  we  can  always  bound  K  from  above  by 
the  value  of  L,  and  all  results  stated  in  this  paper  apply  to  un¬ 
quantized  signals. 

It  should  be  noted  that  the  value  of  the  appended  constant 
points  is  not  important  for  the  key  results  of  Theorems  1  and  2 
to  be  true  with  only  slight  modification  to  their  proofs.  It  is 
only  important  that  these  values  be  constant.  It  is  possible  to 
assign  nonconstant  values  to  these  points  such  that  Theorem  1 
does  not  hold  true.  Finally,  we  also  note  that  Theorems  1  and 
2  represent  median-filter  properties  that  have  been  observed  in 
the  past  without  proof  [4,  p.  212] . 

HI.  Discussion 

The  theory  developed  in  the  preceding  sections  provides  a 
number  of  interesting  results.  First,  we  note  that  every  signal 
in  the  space  of  signals,  *  €  S  can  be  filtered  to  a  unique  root 
with  a  bounded  number  of  repeated  filterings.  Thus,  the  ele¬ 
ments  of  the  root  set  R^  partition  5  as  illustrated  in  Fig.  3 
where  it  is  shown  how  the  signal  space  is  partitioned  by  a  root 
set  with  eight  elements,  whereupon  repeated  filtering  every  sig¬ 
nal  s  €  Si  is  filtered  to  root  rt€RN  and  so  on;  we  will  call 
the  ancestor  set  of  root  rt.  If  a  signal  s  requires  L  filter  passes 
to  reach  the  root  rI(  we  say  that  s  is  tnL  th  generation  ancestor 
of  r, .  We  know  from  Theorem  2  that  any  root  has,  at  most, 
-  2)  ancestral  generations,  and  we  know  that  the  root  of  a 
signal  depends  on  the  filter  window  size,  i.e.,  a  root  for  a  win¬ 
dow  of  size  3  may  not  be  a  root  for  a  window  of  size  5,  al¬ 
though  a  root  for  a  size  S  window  is  always  a  root  for  a  size  3 
window.  In  a  loose  sense,  median  filters  are  a  type  of  low-pass 
filter  with  an  increasingly  narrow  passband  as  the  window  size 
increases. 

The  application  of  median  filtering  to  signal  smoothing  prob¬ 
lems  introduces  an  interesting  twist  to  the  concepts  of  signal 
and  noise.  A  simple  median  filter  has  no  design  parameters 
other  than  window  size,  to  long  as  we  append  AT  values  to  each 


GALLAGHER  AND  WISE:  PROPERTIES  OF  MEDIAN  FILTERS 


141 


end  in  the  way  discussed.  It  cannot  be  designed  to  accommo¬ 
date  special  signal  or  noise  characteristics.  In  the  extreme  case, 
a  filter  can  completely  remove  a  signal  component,  leaving 
only  noise.  It  seems  desirable  that  a  noise-free  signal  be  a  root 
signal  in  order  that  it  is  invariant  to  median  filtering.  If  the 
root  signal  has  added  noise,  then  it  may  or  may  not  be  possible 
to  remove  the  noise  by  filtering.  Noise  that  can  be  filtered  is 
noise  that  changes  the  signal  in  such  a  way  that  the  noisy  sig¬ 
nal  is  an  ancestor  of  the  same  root.  This  noise  can  be  removed 
with  repeated  median  filtering.  However,  if  the  noisy  signal  is 
now  the  ancestor  of  a  different  root,  then  it  cannot  be  removed 
by  repeated  median  filtering.  This  property  of  either  perfect 
signal  recovery  or  false  signal  recovery  points  to  yet  another 
application  of  this  form  of  median  filtering-channel  coding. 
For  this  application,  the  root  set  R  corresponds  to  an  alphabet 
set.  The  transmitted  code  can  contain  either  roots  or  ancestors. 
In  either  case,  decoding  is  accomplished  through  repeated 
filtering. 

In  this  paper,  we  have  established  several  fundamental  theo¬ 
retical  properties  of  one  form  of  median  Alters.  We  have  pre¬ 
sented  necessary  and  sufficient  conditions  for  a  signal  to  be 
invariant  to  median  filtering,  and  we  call  these  signals  roots  of 
the  filter.  We  have  also  shown  that  repeated  Altering  of  any 
signal  results  in  a  root  signal,  and  have  established  the  maxi¬ 
mum  number  of  filtering  operations  required  to  reach  a  root. 
As  a  result  of  the  theory  developed  in  this  paper,  a  better  un¬ 
derstanding  of  the  potential  applications,  as  well  as  the  limita¬ 
tions  of  these  filters,  is  achieved. 

Acknowledgment 

The  authors  would  like  to  acknowledge  the  many  helpful 
comments  of  J.  Tukey,  which  improved  the  readability  of  this 
paper. 

References 

1 1 1  T.  S.  Huang,  G.  i.  Yang,  and  G.  Y.  Yang,  “A  fast  two-dimensional 
median  filtering  algorithm,”  IEEE  Tram.  Acoust.,  Speech,  Signal 
Processing,  vol.  ASSP-27,  pp.  13-18,  Feb.  1979. 


1 2]  N.  S.  iayant,  "Average-  and  median-based  smoothing  techniques 
for  improving  digital  speech  quality  in  the  presence  of  transmission 
errors,”  IEEE  Trans.  Commun.,  vol.  COM-24,  pp.  1043-1045, 
Sept.  1976. 

[3)  L.  R.  Rabiner,  M.  R.  Sambur,  and  C.  E.  Schmidt,  "Applications  of 
a  nonlinear  smoothing  algorithm  to  speech  processing,”  IEEE 
Trans.  Acoust.,  Speech,  Signal  Processing,  voL  ASSP-23,  pp.  552- 
557,  Dec.  1975. 

(4j  J.  W.  Tukey,  Exploratory  Data  Analysis.  Reading,  MA:  Addison- 
Wesley,  1977. 


Neal  C  Gallagher,  lr.  (S’72-M’7S)  received  the 
Ph.D.  degree  in  electrical  engineering  in  1974 
from  Princeton  University,  Princeton,  NJ. 

After  being  a  member  of  the  faculty  of  Case 
Western  Reserve  University,  Cleveland,  OH,  he 
joined  Purdue  University,  W.  Lafayette,  IN,  in 
1976,  where  he  is  an  Associate  Professor.  He 
has  publications  in  the  areas  of  numerical  anal¬ 
ysis,  digital  signal  processing,  source  coding, 
and  optical  information  processing.  He  is  Past 
President  of  the  Central  Indiana,  Central  Illinois, 


Chicago,  and  South  Bend  section  of  the  IEEE  Information  Theory 
Group.  He  has  consulted  for  industry  and  government  in  the  areas  of 
real-time  signal  processing,  spectral  cstimaUon,  and  holography. 


Gary  L.  Wise  (S’69-S‘72-M’74)  was  born  in 
Texas  City,  TX,  on  July  29, 1945.  He  received 
the  B.A.  degree  summa  cum  laudt  from  Rice 
University,  Houston,  TX,  in  1971  with  a  double 
major  in  electrical  engineering  and  mathematics, 
and  the  M.S.E.,  M.A.,  and  Ph.D.  degrees  in 
electrical  engineering  from  Princeton  Univer¬ 
sity,  Princeton.  NJ.  in  1973,  1973,  and  1974, 
respectively. 

He  is  currently  an  Associate  Professor  in  the 
Departments  of  Electrical  Engineering  and 
Mathematics  at  the  University  of  Texas,  Austin.  His  research  interests 
include  random  processes,  statistical  communication  theory,  and  signal 
processing. 


rffANSACTIONS 


ON  INFORMATION  THEORY,  VOL.  IT-28,  NO.  I.  JANUARY  1982 


105 


oboid 


'»<  22  + 

V  V 
rim»*  Jimtr* 

r{>*~i\K>)  (A3) 

!  <  22  ft  [ny^K.  +  KMy^MX'' 

•»W 

[Piy^KjPiy^K.+  Kt)]''1  (A4) 

ic  first  inequality  holds  because  ( Tt  )1/1  is  still  greater  than  one, 
itle  the  second  is  valid  because  we  are  extending  the  sums  over 
I  urger  set.  Inequality  (A4)  can  be  rewritten  as 

'.<ft  2  ['{*•,!  ^  +  **)]'* 

*>nm, 

•  2  +  K>)r{yw\K>)]wl 

» 

2 1  n >. I  k.  +  **)  ny, I  *e)  H  ’  (A3) 

9,  • 

tc  to  the  fact  that  the  channel  is  memoryksa,  we  have  finally 

(A6) 

i  ,'Vc  P0  is  (19)  of  the  text. 

RmiiNcas 

I  R  M  Gagluedr,  and  8.  Harp,  Optical  Cmmmmmmi.  New  York 
Wiley.  1976 


(2)  M.  Rom,  Direr  Eecetvert.  New  York:  Wiley.  1966. 

(3)  N.  Sorensen,  mi  R.  Ciagbardi,  "Performance  of  optical  receivers  with 
avalanche  photodctcctioti,"  IEEE  Tram.  Common.,  vot.  COM-27,  pp. 
1315-1321,  Sept  1979. 

(4)  W.  Peterson,  and  E  Weldon,  Error  Comctmg  Coda,  2nd  ed.  Cam- 
bridge,  MA:  MIT  Press.  1972,  p.  70. 

(51  R.  McIntyre,  “The  distribution  of  gains  in  uniformly  mulliplyiiig 
avalanche  photodiodes:  Theory,"  IEEE  Trmm  on  Electron  Deuces,  vot 
ED- 19,  pp.  703-712,  June  1972. 

|6]  1  Conradi,  "The  disuibution  of  gains  in  uniformly  multiplying  avalanche 
photodiodes:  Experimental,"  IEEE  Thant.  Electron  Devtcti,  vot  ED- 19, 
pp.  713-711,  June  1972. 

|7)  D.  Webb,  R.  McIntyre,  and  J.  Conradi,  "Properties  of  avalanche  photo¬ 
diodes,"  ECA  Rev.,  voi.  35,  pp.  234-278,  June  1974. 

|I1  H.  L.  Van  Trees,  Detection,  Emmeline,  end  Modulation  Theory,  fart  I. 
New  York:  Wiley,  1968. 

(9)  A.  J.  Vilcibi,  "Convolutional  oodaa  and  Iheir  performance  in  communi¬ 
cation  systems,"  IEEE  Tram.  Com  nuns  TechnoL,  vot.  COM- 19.  pp. 
751-772,  Oct  1971. 

(10)  R.  Gagliardi  and  C.  Pratt,  "On  Gaussian  error  probabilities  in  optical 
receivers,"  IEEE  Traiu.  Common  ,  voL  COM-26,  pp  1742-1747.  Sept. 
1980. 


Properties  of  Minimum  Mew  Squared  Error  Block 
Quantizers 

NEAL  C.  GALLAGHER.  JR.,  member,  use.  and 
JAMES  A  BUCKLEW 

A  httmcl — Two  results  In  mhdaawa  asams  agnate  error  guentirHon 
theory  are  prase nlsd.  The  first  section  gives  a  simplified  derivation  of  a 
well- known  upper  bound  to  the  diet  onion  Insmduced  by  a  A -dimensional 
optimum  quantizer.  It  is  then  shown  the!  an  optimum  multidimensional 
quantizer  preserves  the  mean  rector  of  the  Input  aad  that  the  mean  square 
quantization  error  is  given  by  the  suns  of  the  component  variances  of  the 
input  minus  the  sum  of  the  variances  of  the  ampul. 

I.  Introduction 

Block  or  vector  quantization  deals  with  the  representation  of 
multidimensional  elements  with  a  finite  discrete  set  of  values.  The 
values  to  be  quantized  may  naturally  fall  into  a  A: -dimensional 
representation;  typical  examples  are  complex  numbers,  positional 
coordinates,  or  state  vectors.  In  other  cases,  A-dimcnsional  vec¬ 
tors  are  formed  from  blocks  of  k  samples  taken  from  one- 
dimensional  signals.  In  1964  Zador  published  a  number  of  very 
interesting  results  on  the  properties  of  optimal  block  quantizers 
for  the  rth  moment  Euclidean  norm  distortion  measure  (I). 
Among  Zador*s  contributions  are  the  derivation  of  both  upper 
and  lower  bounds  on  the  distortion  introduced  by  the  optimal 
quantizer.  These  bounds  are  derived  without  actually  finding  the 
optimal  quantizer.  Unfortunately,  at  some  points  Zador's  devel¬ 
opment  is  not  easy  to  follow,  and  alternate  derivations  and 
extensions  by  Gersho  [2]  and  Yamada  tt  aL  [3]  have  recently 
appeared.  In  Section  II  eve  present  an  alternate  derivation  of 
Zador's  random  quantization  upper  bound  not  treated  in  either 
12)  or  [3). 

In  [4]  Bucklcw  and  Gallagher  show  that  for  one-dimensional 
mean  squared  error  distortion  the  optimum  quantiser  has  the 
property  that  the  mean  value  of  the  quantizer  output  equals  the 
mean  value  of  the  input  and  also  that  the  mean  square  quantiza¬ 
tion  error  equals  the  variance  of  the  input  minus  the  variance  of 
the  output.  In  [3]  Bucklcw  and  Gallagher  prove  that  the  same 
results  hold  for  constant  step-size  minimum  mean  squared  error 
quantizers.  In  Section  III  we  extend  these  properties  to  k- 
dimensional  optimal  block  quantizers. 

Manuscript  received  December  30,  I9M.  This  work  was  supported  by  the 
Air  Foret  Office  of  Scientific  Research  under  Gram  AFOSR  78-3605. 

N.  C.  Gallaghar.  Jr  .  is  wiih  the  School  of  Electrical  Engineering,  Purdue 
University.  Weal  Lafayette,  IN  47907. 

J.  A.  Bucklcw  is  with  the  Department  of  Electrical  and  Computer  Engineer¬ 
ing,  University  of  Wisconsin,  Madison.  Wt  53705. 


0OI8-9448/82/O100-OIO3SOQ.7S  ©1981  IEEE 


I 


1 

4 


IfcfcE  TRANSACTIONS  ON  INFORMATION  THEORY,  VOL.  IT-tt,  NO.  I,  JANUARY  19*2 


II.  Random  Quantization  Upper  Bound 

In  (2|  Gersho  provides  s  very  readable  derivation  of  Zador’s 
expression  for  quantizer  distortion.  To  improve  continuity  and 
readability  we  employ  Gersho's  notation.  The  quantizer  input  is  a 
A -dimensional  random  vector  x  in  ft*  which  is  quantized  to  one 
of  N  levels  >>„  jv  •  • ,yM  in  €1*.  The  space  ft*  is  partitioned  into 
N  disjoint  and  exhaustive  regions  S„  Sj,-  •  -,SN.  The  quantizer  is 
defined  by  the  function  Q(x)  defined  by  Q(x)  =  y„  if  x  6  Sr 
Note  that  this  definition  does  not  require  that  y,  £  S,,  although 
in  practice  y,  is  usually  contained  in  S,.  The  performance  of  the 
quantizer  is  measured  by  the  distortion 

D  =  i£{||*-G(X)||'} 

where  II  -  II  denotes  the  usual  Euclidean  distance  norm,  the 
operator  £{  - )  denotes  statistical  expectation,  and  the  input  X  is 
a  A  -dimensional  rxndom  input  vector.  The  case  where  r  =  2  is  the 
usual  mean  squared  distortion.  The  expression  derived  by  Zador 
and  Gersho  for  the  minimum  distortion  Dq  obtained  by  use  of  the 
best  quantizer  is 

A,  =  hr'/*C(A.r)llp(a)l!*/(*+„.  (I) 


II  />(*)«.  =[/i£(*)r<**]  . 

and  where  the  constant  C(A,  r),  called  the  coefficient  of  quanti¬ 
zation,  is  independent  of  the  density  p(x)  and  is  in  general 
unknown.  This  expression  is  an  asymptotic  result  valid  only  for 
large  N.  Two  special  cases  for  which  the  value  of  C(k,  r)  is 
known  exactly  are  [2] 

C0-')*7TT2''* 


c<2-2)=5vr 

Consider  the  density  p(x)  having  a  constant  value  of  one  over 
the  unit  volume  hypercube;  then  llp(*)||*/<*<.,)  =  I.  In  this  case 
(I)  becomes 

A>  =  N~'/kC(k,  r).  (2) 

So,  we  see  that  by  finding  a  bound  on  D0  we  also  bound  C(  k,  r). 
To  find  this  bound  we  choose  the  quantizer  output  levels  to  have 
a  random  distribution  uniformly  distributed  over  the  hypercube. 
For  a  particular  input  value  x,  we  find  the  closest  output  level 
and  quantize  to  that  value.  Because  this  quantizer  is  not  the 
optimum  quantizer,  the  associated  distortion  will  bound  from 
above  the  distortion  for  the  optimum  quantizer. 

To  begin,  place  at  random  N  independent  uniformly  distrib¬ 
uted  A-dimensional  samples  in  the  hypercube.  These  will  be  the 
output  levels.  We  take  the  quantizer  input  X  to  have  a  uniform 
distribution  over  the  hypercubc.  We  also  assume  that  N  is  suffi¬ 
ciently  large  so  that  there  is  a  very  small  probability  that  the 
quantizer  input  is  closer  to  an  edge  of  the  hypercube  titan  to  one 
of  the  output  values.  Suppose  that  an  input  value  x  has  arrived 
and  is  silting  in  the  hypercubc  waiting  to  be  quantized.  The 
probability  that  one  particular  output  value  is  within  a  distance  p 
of  this  input  sample  is  given  approximately  by  the  volume  of  a 
sphere  of  radius  p  about  that  sample  point,  or 

Pr  (one  particular  output  level  is  within  p  of  _  y  * 
the  input  sample)  ~  • 

where  if  Vk  is  volume  of  the  unit  radius  sphere,  then  Vkpk  is  the 
volume  of  the  sphere  with  radius  p.  We  are  interested  in  the 
closest  output  level  to  the  input  sample.  To  compute  the  proba¬ 


bility  that  the  closest  output  level  is  within  a  distance  p  of  the 
input  sample,  we  combine  classical  order  statistics  with  the  result 
found  in  [31.  By  employing  this  approach,  we  compute  the 
probability  density  /(p)  for  the  distance  between  the  input  sam¬ 
ple  and  the  nearest  output  level  to  be 

/(p)  =  w[i-Ktp‘]*",W'. 

Note  that  for  large  values  of  N  this  probability  density  goes  to 
zero  rapidly  as  p  increases.  By  construction  p  =  II  x  -  >>,11,  where 
x  is  the  input  value  and  y,  is  the  output  value.  Consequently, 

£{IIX-G(X)||'}  =  E{p'}; 


f>  =  f£{p') 


•'hypercubc 


'N[  1  -  F*p4] M~<kVK  Up- 


Make  the  change  of  variables  s  =  K*p*  and  use  the  fact  that 
r  s  1  to  write 


kv’/k  Jo 


'/Ml 


r(i  +  f)r(\) 


W/k  r(* 


where  T(  -)  is  the  gamma  function.  For  large  N  the  following 
approximation  is  valid: 

r  (w)  . .* 


Wt**r»/S 


Therefore, 


AT'/‘r(l  +j) 


*k;/‘ 

Because  D  fc  Dq,  we  use  (2)  to  write 

r(,  +  i) 
c  *'')s  iTp*  ' 

which  is  Zador's  random  quantization  upper  bound. 


III.  Moment  Properties  of  Optimum  Quantizers 

In  [4]  and  [3]  it  is  shown  that,  for  minimum  mean  squared  error 
one-dimensional  quantizers,  the  mean  of  the  <npui  equals  the 
mean  of  output  and  the  distortion  equals  the  ariancc  of  the 
input  minus  the  output  variance.  These  properties  are  shown  to 
apply  with  and  without  the  equal  step-size  constraint  In  this 
section  we  generalize  these  results  to  the  A -dimensional  case 
We  are  interested  in  the  properties  of  quantizers  designed  to 
minimize  the  distortion  defined  by  (2)  for  r  =  2: 

0  =  {£{||X-C(X)II>}. 

Many  constraints  we  might  impose  on  the  quantizer  can  be 
imposed  by  the  functional  form  of  Q(x),  for  example,  the 
A-dimensional  version  of  the  equal  step-size  condition  might 
require  the  regions  S,S2,---,SM  to  have  equal  volume  and  be 
congruent.  We  had  originally  employed  a  variational  approach  lo 
obtain  the  results  of  this  section;  however,  an  alternate  approach, 
suggested  by  an  anonymous  reviewer,  provides  more  intuition 
into  quantizer  structure.  So,  we  employ  his  method. 


i  TRANSACTIONS  ON  INFORMATION  THEORY,  VOL.  IT-28,  NO.  I,  JANUARY  1982 


107 


To  begin,  we  define  the  parameters  Pt  and  X,  as  follows: 

P,=f  P(x)dx, 

*s, 

and 

M,  =  p-'jxP{x)dx.  <3) 

s, 

We  note  that  partition  (S,)*,  need  not  be  the  optimum  parti¬ 
tion.  Consider  two  different  quantizers  defined  over  the  partition 
(S,)*  one  with  output  value  X,  and  ooe  with  output  value  Yr 
These  quantizer  functions  are  represented  as  QJ.X)  -  X,  and 
Q(X)  =  Y„  respectively.  It  will  be  shown  that  the  quantizer 
QU(X)  is  optimum  for  the  given  partition.  We  have  that 

£{ll*~<?(*)tl2}  =  2  /  {x-xl  +  xl-y,)iP(x)dx. 
t*t  s< 

<<) 

By  (3),  we  have 

/  ( x-  x,)(x,  ~y,)P(x)  dx  =  0\ 

therefore,  (4)  becomes 

N 

£{11X  -  t?(X)ll»)  =  £{||X  -  G«(Jf)ll1}  +  2  P.llx,  -  y,«J. 

I 

(5) 

The  expression  in  (3)  illustrates  that  the  quantizer  QU(X)  pro¬ 
duces  an  error  no  larger  than  any  other  quantizer  Q(X)  for  a 
given  partition.  Also,  by  (3)  we  see  that  the  mean  of  the  quantizer 
outputs  equals  the  mean  value  of  the  input;  this  follows  by 

2  1  f  xp(x)  dx=  jxP(x)dx,  (6) 

<=t  i- 1  *■ 

where  the  left  side  is  the  mean  of  the  output  and  the  right  side  the 
mean  of  the  input.  It  can  also  be  shown  that  the  quantizer  error 
equals  the  variance  of  the  input  minus  the  variance  of  the  output. 
Consider  the  input  variance 

£{|IJT  -  Jll*}  =  £{HX  -  Qo(X)  +  e0(* )  -  E{X}I|,} 

-  £{HJf  -  e0(*)llJ}  +  £{«<2o<*>  '  E(X)H2}. 

(7) 

where  as  before  the  cross  terms  are  zero.  The  right  side  of  (7)  is 
simply  the  sum  of  the  quantizer  error  and  the  output  variance. 

Equations  (6)  and  (7)  specify  the  first  and  second  moment 
properties  of  the  optimum  quantizer;  these  properties  follow 
regardless  of  the  optimality  of  the  partition.  In  addition,  it  is 
noteworthy  that  the  optimum  quantizer  is  not  unique.  A  simple 
example  serves  to  illustrate  this  point.  Consider  a  two-dimensional 
circularly  symmetric  input  density.  Any  rotation  of  a  minimum 
error  quantizer  is  also  a  minimum  error  quantizer.  The  same 
property  holds  for  one-dimensional  quantizers,  where  it  is  possi¬ 
ble  to  have  more  than  ooe  minimum  error  quantizer. 

Vll.  Summary 

This  correspondence  contains  two  results  dealing  with  the 
properties  of  A-dimensional  minimum  mean  squared  error  quan¬ 
tizers  We  have  established  necessary  conditions  for  optimum 
quantizers.  Them  conditions  are  used  to  show  that  for  k- 
imcnsional  quantizers  the  mean  value  of  the  input  is  preserved 
m  the  output  and  that  the  mean  squared  error  equals  the  input 


variance  minus  the  output  variance.  Also,  a  simplified  derivation 
of  Zador's  random  quantization  upper  bound  is  developed. 

References 

1 1)  P.  Zador.  "Development  and  evaluation  of  procedure*  for  quantizing 
multivariate  dumbu lions,"  Ph  D.  disacrlation.  Stanford  Univ ,  Stanford, 
CA.  1964:  University  Microfilms  Inc  no.  64-9855. 

12)  A  Cersho.  “ Asymptotically  optimal  block  quantization."  /£££  Trims. 
Inform  Thnry.  vot.  IT-25,  pp.  375-310.  July  1979. 

(3|  Y.  Yamada,  S  Taaaki,  and  R  M.  Gray,  "Asymptotic  performance  of 
block  quantisers  with  difference  distortion  measures.”  f£££  Trims.  In¬ 
form  Thnry,  vot  IT-26,  pp.  6-14,  Jan.  19*0 

(4)  J.  A.  buckler*  and  N.  C  Gallagher,  "A  note  on  optimal  quantisation.” 
IEEE  Trout  Inform  Thnry,  vot.  IT-25,  pp.  365-366,  May  1979 

(5)  "Some  properties  of  uniform  step  size  quantisers."  IEEE  Tram  Inform. 
Thnry .  vot.  IT- 26.  pp.  610-613,  Sept  1980 


The  Binary  Multiplying  Channel— A  Coding  Scheme 
that  Operates  Beyond  Shannon’s 
Inner  Bound  Region 

l  PIETER  M.  SCHALKWUK,  senior  member,  wee 

A  ha  mart—  BUckwatl's  binary  amMpfytag  dm— «1  fat  weB  known  as  an 
axampia  at  a  two-way  channel  tar  which  Shannon's  inner  and  outer  bounds 
to  tha  capacity  region  differ.  A  Eesenatatitk  coding  scheme  k  given  which 
outperform*  the  timer  region  for  dUo  channel.  Duech  had  carter  obtained 
an  aaMognar  result  for  another  type  af  two-way  chnanel. 

I.  Introduction 

Shannon  [I]  derived  inner  and  outer  bounds  to  the  capacity 
region  of  the  two-way  channel  (TWC).  A  TWC  (ace  Fig.  I)  is  a 
discrete  memory  less  channel  with  finite  input  and  output  al¬ 
phabets  and  defined  by  a  matrix  [/’(>,,  y2\x„  x2))  of  transition 
probabilities.  Shannon’s  inner  bound  region  equals  the  convex 
hull  of  the  region  of  rate  pain  (/(*,;  y2|  *2),  l(X2,  Y,  |  *,)). 
where  the  input  distribution  P(x ,,  x2)  is  allowed  to  vary  over  all 
product  distributions  P(xx,  x2)  -  P(x,)P(x2).  Likewise,  the 
Shannon  outer  bound  is  the  convex  hull  of  the  region  of  rate 
pairs  (/(*,;  Y2 1 X2),  f(X2;  Y,  |  Xx)),  when  the  input  distribution 
P(xx,  x2)  is  no  longer  restricted  to  be  of  the  product  type. 

Blackwell’s  binary  multiplying  channel  (BMC),  which  is  a 
TWC  satisfying  K,  =  Y2  =  XxX2,  is  an  example  of  a  simple  TWC 
for  which  the  inner  and  outer  regions  differ.  In  Fig.  8  we  have 
reproduced  from  (1)  the  boundary  C,  of  the  inner  region  and  the 
boundary  G„  of  the  outer  region  for  the  BMC  (See  [I]  for  explicit 
equations  specifying  these  regions.)  We  show  that  each  point  on 
the  third  curve  in  Fig.  8  can  be  achieved  by  a  certain  determinis¬ 
tic  coding  scheme.  Consequently  the  inner  region  for  the  BMC  is 
not  the  capacity  region.  (An  analogous  result  had  been  obtained 
earlier  by  Dueck  [2]  for  a  TWC  which  was  not  a  BMC.)  For  the 
sake  of  simplicity,  in  the  next  section  we  first  describe  the  coding 
scheme  which  achieves  the  point  on  our  curve  for  which  R,  =  R2, 

11.  The  Cooino  Strategy 

The  senders  try  to  send  information  that  without  loss  of 
generality  can  be  taken  as  the  location  of  a  subinterval  [3),  (4),  of 

Menuacript  motived  October  6, 19P0;  reviaed  April  15, 1981  Thu  paper  was 
prevented  at  the  1981  International  Symposium  on  laformslioo  Theory,  Sun 
Monica.  CA,  Feb  9-12 

The  author  it  with  the  Department  of  Etecuical  Engineering,  Eindhoven 
University  of  Technology.  Den  Dotech  2,  P.O.  Boa  513.  5600  MB  Eindhoven. 
The  Netherlands. 


OOI8-9448/82/0100-OI07J00.75  01981  IEEE 


NONUN l form  multidimensional  quantization 


by 

Kerry  D.  Rines 
TASC 

8301  Greensboro  Dr.,  Suite  1200 
McLean,  VA  22102 

and 

Neal  C.  Gallagher,  Jr. 

School  of  Electrical  Engineering 
Purdue  University 
Nest  Lafayette,  IN  *790? 

and 

James  A.  Buck  lew 

Department  of  Electrical  and  Computer  Engineering 
University  of  Wisconsin 
Madison,  Wisconsin  53705 


We  have  shown  in  a  previous  paper  that  an  op¬ 
timum  quantizer  can  be  designed  for  the  random  vec¬ 
tor  x,  when  £  is  uniformly  distributed.  However, 
finding  an  optimum  quantizer  when  £  has  an  arbi¬ 
trary  density  function  is  in  general  very  diffi¬ 
cult.  Thus  in  this  paper  we  consider  the  design  of 
near-optimum  quantizers  for  £  when  the  density  is 
nonunifora.  The  results  show  that  if  we  allow  the 
number  of  quantization  levels  to  be  large,  we  can 
obtain  a  distortion  performance  arbitrarily  close 
to  the  distortion  of  the  optimum  quantizer.  The 
results  also  provide  a  useful  tool  for  the  compand¬ 
ing  design  of  optimum  quantizers  discussed  later  in 
the  paper. 

I.  Introduction 

A  number  of  authors  C13-C4J  have  examined  the 
advantages  of  multidimensional  quantization  over 
univariate  quantization.  Unfortunately  multidimen¬ 
sional  quantizers  are  difficult  to  design  and  must 
usually  be  implemented  using  a  search  procedure. 
The  disadvantage  of  a  search  implementation  is  that 
the  storage  and  computation  requirements  increase 
with  the  number  of  quantization  levels  and  the  di¬ 
mension  of  the  quantizer.  In  a  previous  paper  C53, 
we  present  a  method  called  prequantization  for  the 
design  of  optimum  uniform  multidimensional  quantiz¬ 
ers  without  the  drawbacks  of  a  search.  Now  in  this 
paper  we  extend  Bennett's  companding  results  C6]  to 
k-dimensions  for  the  design  of  nonuniform  multidi¬ 
mensional  quantizers.  These  new  methods  also  avoid 
problems  associated  with  a  search. 

II.  Piecewise  Companding 
Let  plx)  be  the  probability  density  function 
Ipdf)  of  the  vector  X.  We  begin  by  constructing  a 
density  glx)  that  is  7  piecewise  constant  approxi¬ 
mation  of-  plx).  Let  S  be  the  compact  support  of 
both  pU)  and  glO.  We  partition  S  into  M  compact 
regions  each  denoted  by  C.  and  with  area  (measure) 

m .  for  i  *  1,2,. ..,N.  The  density  is  then  defined 

as 

pi 

glx)  ■  — —  ,  i  *  1,  2,  ...,  M 


where 

P ,  =  /  P<x)dx  . 

Ci 

Now  compare  the  quantization  of  the  random  vec¬ 
tors  £  and  £  where  £  has  the  density  gl£).  We  de¬ 
fine  a-  as  the  optimum  quantizer  for  X  given  plx) 
o  —  — 

and  Q  as  the  optinua  quantizer  for  Y  given  g(x). 
0 

Zador's  equation  for  the  minimum  per  sample  distor¬ 
tion  of  £  is 

Do  «ie<||£-a0(£)||J>  *  C(k,r)N'r/k||p||k/k+r  (1) 
where 

£  k-dimensional  vector 

a  (X)  quantized  output  of  a 
o  —  o 

N  number  of  quantization  levels  (assumed 

C(k,r)  constant  dependent  only  on  k  and  r 

llplln  [/P<i>0bx]1/B  . 

Similarly  the  optimum  distortion  corresponding  to 
the  random  vector  £  is 

D  -  f  EC||£-9g(£>||^)  -  C(k,r)N'r/k||g||k/k+r  .  (2) 

Using  (1)  and  (2)  and  Bennett's  integral  for 
mismatched  quantizers,  we  can  show  that  a  near¬ 
optimum  quantizer  for  the  random  vector  X  can  be 
designed  by  finding  an  optimum  quantizer  Tor  a  ran¬ 
dom  vector  with  the  density  glx).  As  the  approxi¬ 
mation  of  plx)  by  glx)  becomes- more  accurate  IH«»), 
the  distortion  approaches  the  optimum  distortion. 
Given  this  background,  we  now  examine  the  design  of 
optimum  quantizers  for  random  vectors  with  piece- 
wise  constant  densities. 

We  design  the  optimum  quantizer  for  the  density 
glx)  by  finding  the  number  of  quantization  levels 
that  must  be  assigned  to  each  partition  C  .  The 


Tku>  m> Ik  mu  pxuented  cut  the  1982  Con^ence  on  Momition  Science*  and  SyUvu, 
P/UnceXon  Univvuity. 


Ue  now  have  a  method  called  piecewise  companding 
for  designing  near-optimum  quantizers  for  £  given 
plO.  With  this  method  the  support  S  is  first  par¬ 
titioned  into  H  regions  and  then  each  region  is 
quantized  using  an  optimum  uniform  multidimensional 
quantizer  with  the  number  of  quantization  levels 
specified  in  (?>. 

Ill  Optimal  Companding 

*  number  of  important  properties  of  optimal  com¬ 
panding  have  been  examined  in  the  literature.  How¬ 
ever,  to  the  authors'  knowledge  an  example  of  an 
optimum  k-dimensional  compander  has  never  been 
presented.  In  this  section  we  construct  an  optimum 
2-dimensional  quantizer  using  companding.  The  ex¬ 
ample  adds  insight  into  the  companding  problem  and 
suggests  general  guidelines  for  the  companding 
design  of  optimum  k-dimensional  quantizers. 

Bennett  C63  was  the  first  to  use  companding  to 
design  a  nonuniform  1-dimensional  quantizer.  The 
structure  of  a  typical  companding  system  is  shown 
in  Figure  1,  The  input  is  first  compressed  by  the 


Figure  1  Typical  companding  system. 


nonlinearity  f(x)  and  quantized  with  a  uniform 
quantizer.  The  uniformly  quantized  value  is  then 

expanded  by  the  nonlinearity  f  Vx).  Bennett's 
work  was  later  extended  by  Panter  and  Dite  [73. 
Panter  and  bite  derive  an  expression  that  can  be 
used  to  design  the  optimum  companding  functions 
given  the  input  density  function  and  assuming  N  is 
large.  *s  a  result  it  is  a  relatively  simple  task 
to  design  a  companding  system  for  an  optimum  nonun¬ 
iform  1-dimensional  quantizer. 

In  C83  Buck  lew  shows  that  the  companding  design 
can  be  extended  to  k  dimensions.  For  k  dimensions, 
the  uniform  quantizer  in  Figure  1  becomes  the  op¬ 
timum  uniform  k-dimensional  quantizer.  Similarly 
the  compressor  and  expander  functions  become  k- 
dimensional  invertible  nonlinearities.  Bucklew 
shows  that  the  optimum  compressor  and  expander 
functions  must  be  conformal  almost  everywhere.  »s 
it  turns  out,  this  restriction  severely  limits  our 
ability  to  design  ootimum  companding  systems.  How¬ 
ever,  using  the  results  of  Section  II  and  the  idea 
of  conformality,  we  can  construct  an  example  of  an 
optimum  compander. 

In  practice  we  would  be  given  a  density  function 
and  asked  to  design  the  optimum  compander.  To  con¬ 
struct  this  example  we  consider  the  problem  in  re¬ 
verse.  First  we  choose  a  compander  that  satisfies 
the  conformality  constraints  and  then  we  find  the 
probability  density  function  for  which  the  com¬ 
pander  is  optimum. 

Let  (U,V)  be  a  random  vector  with  the  density 
function  p(u,v>.  For  convenience  let  the  support 

of  p(u,v)  be  the  set  S  *  <(u,v):  1  <  u^  ♦  v^  <  e , 
v  >  0)  as  shown  in  figure  2.  Now  consider  the 

2-dimensional  conformal  map  W  *  e*  where  w  *  u  *  iv 
and  i  *  x  ♦  iy.  We  define  '.he  compressor  function 
as 


discussed  above.  Let  N.  be  the  number  of  quantita¬ 


tion  levels  contained  within 


For  N  large,  we 


Not  to  Scale 


Figure  2  Support  of  density  p(u,v). 


x  =  InVu2  ♦  vZ 

y  1  tan  1  — 
u 

and  we  define  the  expander  function  as 


u  »  e  cos  y 

X  4  <9> 

v  *  e  sin  y  . 

2 

The  vector  (U,V)  is  mapped  into  the  square  X  (0,t) 

i=1 

in  the  Z-plane  by  the  tomprttsor  function  in  (8), 
The  resulting  vector  (X,Y)  is  quantized  using  the 
optimum  uniform  2-dimensional  (hexagonal)  quantiz¬ 
er.  Then  the  output  from  the  hexagonal  quantizer 
is  mapped  back  into  the  W-plane  using  the  expander 
function  in  (9).  We  non  must  find  the  density 
p(u,v)  for  which  this  quantizer  is  optimum. 

We  begin  with  a  piecewise  companding  design  for 
the  unknown  density  p(u,v).  The  support  $  of 
p(u,v)  is  partitioned  into  N  equal-sized  regions 
C^,  each  with  area  m.  *  AuAv.  Using  (7),  the  op¬ 
timum  number  of  quantization  levels  for  each  parti¬ 
tion  is  given  by 


can  consider  the  hexagonal  quantization  levels  to 
be  uniformly  distributed  within  0  £  x,y  <  x.  Thus, 

•  *"  I 

N,j  will  given  by  the  ratio  of  the  area  of  to  the 


total  area  of  the  square.  If  we  let 


*  ki  Axiy 


is  given 


be  the  area  of  C^,  the  number  of  levels  N. 


“i  “  ~2  • 


The  expander  function  in  (9)  maps  the  N.  quanti- 

zation  levels  in  into  the  partition  in  the 

W-plane.  Since  the  mapping  is  nonlinear,  the 
quantization  levels  will  no  longer  be  in  the  form 
of  a  hexagonal  lattice.  However,  the  quantization 
will  be  approximately  hexagonal  when  the  area  of  C. 

is  small. 

We  now  assume  there  exists  a  density  p(u,v)  con¬ 
tinuous  almost  everywhere,  such  that  the  number  of 

I 

quantization  levels  N^  in  (10)  is  equal  to  N.. 

Thus  for  N  large  and  AuAv  small,  the  distortion  of 
the  companding  system  in  (8)  and  (9)  is  approxi¬ 
mately  equal  to  the  distortion  of  the  piecewise 

■ 

compander.  Setting  *  N.  we  obtain 


k.axAy  * 


We  can  rewrite  this  expression  as  follows.  As 
stated  above  that  the  compressor  function  in  (8) 

maps  C.|  onto  C,.  for  all  i.  Then  by  definition, 

/  J  (u,v)dudv  *  /  dxdy 

ci  "  c! 


■  k.  AxAy 


p .  -  f  p(u,v)dudv  . 

’  ci 

We  implement  the  piecewise  compander  as  follows. 
First,  we  find  the  partition  that  contains  the  ran¬ 
dom  vector  (U,V) .  Then  for  each  partition  C^, 

(U,V>  is  quantized  using  an  N. -level  hexagonal 
quantizer. 

We  compare  this  implementation  of  the  piecewise 
compander  with  the  companding  system  described  in 
(8)  and  (9).  The  compressor  function  in  (8)  maps 

each  partition  C.  into  a  new  partition  C  in  the 

2 

X  (0,x)  square  of  the  Z-plane.  The  partition  C. 
i»1 

is  then  quantized  using  the  hexagonal  quantizer 


where  J<y(u,v)  is  ihe  Jacobian  of  the  transforma¬ 
tion  in  (8).  Using  this  result  and  the  definition 
of  p^  in  (10),  we  can  rewrite  (12)  as 

«2C/  p(u,v)dudv)1/2 
r  Ci 

/  ,v)dudv  ■  -g -  .(13) 

^  E  t f  p(u,v)dudvj^2 

C, 

Recall  from  section  II  that  in  the  limit  as 
H  ♦  •  and  AuAv  ♦  0,  the  distortion  of  the  piecewise 
companding  system  approaches  the  distortion  of  the 
optimum  quantizer  for  p(u,v>.  We  can  also  show 
that  for  this  same  limiting  relation,  the  distor¬ 
tion  of  the  companding  system  in  (8)  and  (9)  is 
equal  to  the  distortion  of  the  piecewise  compander. 
Therefore,  the  companding  system  in  (8)  and  (9) 
will  be  an  optimum  quantizer  for  the  density  p(u,v> 
that  satisfies  (15)  in  the  limit  as  AuAv  *  0.  Di¬ 
viding  both  sides  of  (13)  by  AuAv  and  taking  the 
limit  as  AuAv  -  0  we  find 


I 


i 

1 

\ 


J 


«y 


<u,v) 


2  1/2.  , 

■  p  (u,v) 

/  p  <u,v>dudv 

S 


possible  to  design  an  optimum  companding  system 
for  all  but  a  feu  k  dimensional  densities.  this 
further  underscores  the  importance  of  the  piecewise 
companding  technique. 


„  1/2.  . 
*  K  p  <u,v> 


(14) 


References 


where  K  is  a  constant. 

Computing  the  Jacobian  of  the  compander  in  <8), 
ue  find  the  companding  system  is  optimal  for  the 
density 

p(u,v)  *  ^^2 - ;  1  <  u^  ♦  v^  <  e^*  ,  w  >  0 

<u  *  v  ) 

*  0  ;  elsewhere  . 

IV.  Summary 

Ue  have  discussed  the  design  of  optimum  and 
near-optimum  quantizers  for  random  vectors  with 
nonuniform  density  functions.  For  the  design  of 
near-optimum  quantizers  a  piecewise  companding  ap¬ 
proach  was  presented.  While  not  optimum,  quantiz¬ 
ers  using  piecewise  companding  can  be  designed  for 
random  vectors  having  ay  given  k-diaensional  densi¬ 
ty  function. 

The  use  of  k  dimensional  companding  systems  for 
optimum  nonuni  form  quantization  was  also  eiamined. 
Extending  the  results  in  (K)  we  find  that  a  neces¬ 
sary  condition  on  the  Jacobian  of  the  optimuai 
compressor  function  is 

J^u)  *  k  pk/k*r<u> 

where  r  is  the  power  of  the  distortion  measure. 
While  these  results  add  to  our  understanding  of  op¬ 
timal  companding,  they  also  suggest  that  it  may  be 


Cl  3  P.  lador,  Development  and  Evaluation  ot_ 

Procedures  for  fluent i zing  Hutt  i variate 

Distributions,  Ph.D.  Dissertation,  Stanford 
University,  1964,  University  Microfilm  ho. 
64-9855. 

[23  A.  Gersho,  “Asymptotically  Optimal  Block 

Buantization,"  t£EE  Irens,  on  Inform.  Theory, 
Vol .  IT-25,  pp.  323-380,  July  1979. 

[33  Y.  Yamada,  S.  Tazaki,  and  R.  H.  Gray,  "Asymp¬ 
totic  Performance  of  Block  Quantizers  with 
Difference  Distortion  Measures, "  IEEE  Trans, 
on  Inform.  Theory,  Vol.  IT-26,  pp.  6-14,  Janu¬ 
ary  1980. 

[43  J.  A.  Buck  lew,  “Upper  Bounds  to  the  Asymptotic 
Performance  of  Block  Quantizers,"  to  appear  in 
IEEE  Trans,  on  Inform.  Theory. 

[53  K.  D.  Rines  and  h.  C.  Gallagher,  Jr.,  "The 
Design  of  Itiltidiaensional  Quantizers  using 
Prequantization,"  Proceedings  of  the 
Eighteenth  Annual  Allerton  Conference,  pp. 
446-453,  October  1980. 

[63  W.  R.  Bennett,  "Spectra  of  Quantized  Signals," 
B.S.T.J.,  Vol.  27,  pp.  446-472,  July  1948. 

[73  P.  F.  Panter  and  W.  Dite,  "Quantization  in 
Pulse-Count  Modulation  with  Nonunifora  Spacing 
of  Levels,"  Proc.  IRE,  vol.  39,  pp.  44-48, 
1951. 

[83  J.  A.  Buck  lew,  "Companding  and  Random  Quanti¬ 
zation  in  Several  Dimensions,"  IEEE  Trans.  In¬ 
form.  Theory,  Vol.  IT-27,  pp.  207-211,  March 
1981. 


The  authors  wish  to  acknowledge  partial  support  by  the  Air  Force  Office  of  Scientific 
Research  under  grant  AFOSR-78-3605. 


298 


IEEE  TRANSACTIONS  ON  COMMUNICATIONS.  VOL.  COM-30,  NO.  I .  JANUARY  1 


A  Note  on  the  Computation  of  Optimal  Minimum 
Mean-Square  Error  Quantizers 

J.  A.  BUCKLEW  AND  N.  C.  GALLAGHER.  JR. 


Abstract — This  paper  coaaMen  the  problem  associated  with 
competing  opriasat  minimum  mean-square  error  quantizers.  Most 
computational  methods  ia  correal  use  arc  Iterative.  These  iterative 
schemes  are  extremely  sensitive  to  initial  conditions.  Various  methods 
of  obtaining  good  initial  conditions  are  presented  and  discussed. 

I.  INTRODUCTION 

In  his  classic  paper  of  I960,  Max  presents  an  iterative 
scheme  for  the  computation  of  one-dimensional  minimum 
mean-squared  error  quantization  characteristics  ( 1 1 .  In  addi¬ 
tion.  he  solves  for  the  optimum  Gaussian  quantizer  for  up  to 
3t>  output  levels  In  [2|.  Gallagher  uses  Max's  method  in  the 
computation  of  optimum  Rayleigh  quantizer  parameters,  ami 
in  1 3 1  Paez  and  t ilivton  use  the  same  method  to  compute  the 


Paper  approved  by  the  Editor  for  Data  Communication  Systems  of- 
the  IEEE  Communications  Society  for  publication  without  oral  presen¬ 
tation.  Manuscript  received  January  S,  1981;  revised  April  27,  1981. 
This  work  was  supported  by  the  Air  Force  Office  of  Scientific  Research 
under  Grant  AFOSR  78-3605. 

J  A.  Bucklew  is  with  the  Department  of  Electrical  and  Computer 
Engineering,  University  of  Wisconsin,  Madison,  Wl  S3706. 

N.  C.  Gallagher,  Jr.  is  with  the  School  of  Electrical  Engineering,  Pur¬ 
due  University,  West  Lafayette,  IN  47907, 


optimum  Laplacian  quantizer  later  recomputed  by  Adams  and 
Giesler  [4],  Max's  algorithm  is  very  simple  to  program  into  a 
digital  computer,  and  we  view  this  simplicity  as  a  good  reason 
for  using  his  method.  However,  one  problem  that  arises  with 
this  algorithm  is  its  failure  to  always  converge  to  the  optimum 
solution  when  the  number  of  quantizer  output  levels  is  large. 
The  reason  for  this  is  that  the  initial  guess  for  starting  the 
iteration  must  be  increasingly  precise  as  the  number  of  quan¬ 
tizer  levels  becomes  large.  So,  for  a  64-level  quantizer.  Max's 
algorithm  will  not  converge  to  the  optimum  solution  unless 
the  initial  guess  for  the  first  output  level  is  very  close  to  the 
true  value.  This  difficulty  has  prompted  others  to  employ 
more  sophisticated  optimization  methods  in  the  solution  foi 
optimum  quantizers.  For  example,  Pearlman  and  Senge  (5| 
use  a  vector  space  optimization  technique  that  is  a  combination 
of  the  steepest  descent  and  Newton-Raphson  methods  to  solve 
for  the  optimum  Rayleigh  quantizer.  It  is  not  our  purpose  to 
detract  from  this  and  similar  methods  that  do  work  well,  but 
in  our  view,  if  the  starting  point  problem  can  be  solved.  Max's 
method  is  the  preferred  method  of  solution.  In  Section  II  we 
discuss  several  methods  for  choosing  the  iteration’s  initial  con¬ 
dition  very  accurately,  and  we  have  demonstrated  convergence 
of  Max's  algorithm  for  at  least  10000  output  levels  and  pres¬ 
ent  numerical  examples  in  Section  HI. 

II.  THE  COMPUTATION  OF  OPTIMUM 
ONE-DIMENSIONAL  QUANTIZERS 

A  common  method  for  implementing  one-dimensional 
quantizers  is  the  companding  method  as  discussed  by  Smith 
1 6 1 .  The  companding  method  is  straightforward:  the  input 
signal  x  with  probability  density  p(x)  first  enters  the  invertible 
nonlinearity  g(x),  called  the  compressor;  then  it  goes  into  a 
uniform  quantizer  over  the  range  (0,  1  ] ,  and  upon  reconstruc¬ 
tion  it  passes  through  the  expansion  nonlinearity  g~ 1  (*).  For 
minimum  mean-squared  error  quantiza:  :on.  the  asymptoti¬ 
cally  optimum  compressor  function  is  given  by 

*(*)=  [pOOJ  ,/3<0-  j  j  [pOOI  1/3  dy.  (I) 

In  Max's  classic  1960  paper  an  iterative  method  is  presented 
whereby  the  exact  quantizer  parameters  can  be  computed  for 
finite  N. 

Max’s  algorithm  provides  a  method  for  the  solution  of  the 
equations 

*i  =  (y<  +  yi-i)/2.  i  =  2,-,/V  (2a) 

and 

*«  i 

U  -  Vi)/i(x)<lx  (),  i  ■-  I.  ./V  (2b> 

where  the  output  levels  of  the  quantizer  are  denoted  yt. 
yi’  ">  J'tv  and  the  internal  breakpoints  as  *|,  *j,  •••,  *yv*i 
Typicslly,  endpoint  values  e,  and  <jv*t  are  known  a  priori  and 
the  first  step  of  Max's  procedure  is  to  choose  t  value  for  yx 
with  which  to  solve  (2b)  for  the  value  tt.  We  then  use  this 
value  in  (2a)  to  find  y2  and  uae  this  to  find*3  in  (2b),  and  so 
on.  The  last  integral  over  («N,  tN+x)  can  be  used  to  determine 


l 


0090-6778/8 2/01 00-0298$00. 75  ©  1982  IEEE 


IEEE  TRANSACTIONS  ON  COMMUNICATIONS,  VOL.  COM-30,  NO.  1,  JANUARY  19S2 


299 


the  accuracy  of  the  initial  guess  for  y\.  If  the  last  integral  is 
zero  within  a  specified  error,  we  use  the  computed  parameters 
to  specify  the  quantizer,  if  not,  we  make  a  new  guess  for  y  i 
and  begin  the  procedure  again.  Details  on  how  to  modify  the 
initial  guess  for  y,  are  not  specified  by  Max. 

We  have  computed  quantizers  using  Max's  method  for 
several  densities.  It  has  been  our  observation  that  the  converg¬ 
ence  properties  of  Max’s  algorithm  are  greatly  dependent  on 
the  initial  guess  for y\.  Let  y^s  denote  the  first  output  level 
for  an  optimum  A  level  quantizer.  Intuitively,  if  the  first  guess 
at  yts  (call  it  Pin)  is  very  close  to  yi(/v*t).  then  Max’s 
algorithm  tries  to  converge  to  the  A  +  1  level  quantizer.  A 
consideration  of  Max’s  method  indicates  that  the  first  A  steps 
of  the  algorithm  are  the  same  for  the  A  or  A  +  1  level  quanti¬ 
zers.  Although  never  reported  in  the  literature,  it  is  our  under¬ 
standing  that  this  phenomenon  has  been  widely  observed  (7] . 

As  an  aside,  we  remark  that  the  conditions  presented  in  (2) 
are  not  sufficient  conditions  to  specify  the  optimum  quantizer; 
they  are  only  necessary.  However,  in  196S  Fleisher  [8] 
showed  that  if 

d 2 

—  (lnp(x)l  <0 

dx 

then  the  expressions  in  (2)  are  both  necessary  and  sufficient 
for  the  specification  of  the  minimum  mean-squared  error 
quantizer,  and  their  solution  provides  us  with  the  unique 
optimum  quantizer. 

We  now  describe  two  similar  methods  for  generating  a  good 
initial  condition.  First,  note  that  the  initial  condition  can  be  a 
guess  at  the  value  for  ora  guess  for  the  value  of  any  >/, 
i=l,  — ,  A  wherever  we  choose  to  begin  the  iteration.  The 
first  method  is  a  modified  version  of  an  estimation  method  by 
Panter  and  Dite  [9]  and  Roe  [10].  The  second  method 
employs  a  companding  model  to  produce  the  iteration  starting 
point.  Both  methods  grow  more  precise  as  the  number  of 
quantization  levels  A  increases.  Each  method,  however, 
requires  computation  to  generate  an  initial  value;  the  com¬ 
plexity  of  this  computation  varies  depending  on  the  distribu¬ 
tion  of  the  variable  to  be  quantized. 

In  the  first  method  we  use  the  asymptotic  level  density 
X(x)  for  the  minimum  mean-squared  error  quantizer.  X(x)Ax 
is  approximately  the  ratio  of  the  number  of  output  levels  in 
a  region  Ax  about  x  to  the  total  number  of  output  levels  A. 
This  function  is  the  first  derivative  of  the  compressor  function 
g(x)  in  (1): 


p(x)  is  a  zero-mean  symmetric  density  (no  Dirac  delta  func¬ 
tions),  that  A  is  even,  and  that  a  unique  optimum  quantizer 
exists.  The  initial  condition  for  the  Max  iteration  is  a  guess  for 
first  output  level  greater  than  zero.  We  will  call  this  level  yN/2 
We  first  make  the  observation  that  the  output  levels  must  be 
symmetric  about  the  origin.  Also,  for  large  A,  the  distance 
between  the  breakpoint  at  zero  and  yN/2  approximately 
equals 

y  * - I -  (Si 

j  2AJX0*  ) 

2 

The  solution  of  this  equation  provides  the  initial  guess  for 
yN/2-  This  basic  procedure  can  be  used  with  modifications  for 
A  even  or  odd  with  most  common  probability  densities  Some 
numerical  examples  are  provided  in  the  next  section 

The  second  method  uses  the  companding  function  to  work 
backwards  from  the  known  uniform  quantizer  over  |0.  1 1  in 
order  to  estimate  the  initial  output  level.  In  fact,  the  method 
provides  a  reasonable  approximation  to  the  entire  quantizer 
An  A  level  uniform  quantizer  on  [0,  1  ]  has  output  levels 

.  2«-  1 

y‘~~2N'  ,  =  1’ "  U” 

Therefore,  the  compandi  *  approximation  is  simply 

**  »"'(.?/)  =  (7) 

For  the  purpose  of  identification,  we  will  refer  to  the  first 
method  of  (S)  as  the  X-approximation  and  the  second  as  the 
^-approximation.  In  hindsight  these  two  methods  seem  ob¬ 
vious;  however,  they  have  apparently  not  been  widely  used 

III  NUMERICAL  EXAMPLES 

In  this  section  we  provide  some  examples  us.ng  the  X-  and 
g-approximations  to  estimate  the  initial  input  interval  end¬ 
point  of  a  Max  quantizer.  The  asymptotically  optimum  mean- 
square  error  companding  characteristic  is  given  by 


j  P(y)11 3  dy 


M  *)=/(•*) 

(3) 

Smith  (6j  shows  that  this  function  has  the  property  that  for 
adjacent  output  levels  and  ( , 

yt*i  -yi3*  .TT-  ’  for  y  €(>-,, y(tl]  (4) 

AX(y) 

when  the  number  of  output  levels  is  large  As  an  aside,  we 
remark  that  our  compressors  always  have  unity  range.  Smith 
allows  more  generality  in  his  formulas.  The  best  way  to  il¬ 
lustrate  the  use  of  (4)  is  through  an  example.  Suppose  that 


=  [#>(*)!  1/3 


u: 


(P(>)j 1/3  dy 


where  p(y )  is  our  input  probability  density. 

The  first  example  we  consider  is  when  p(y)  is  the  (laussian 
unit  variance,  zero  mean,  probability  density:  g(x)  is  then 
given  by  A(1  +  erf  (x/\/6));  hence,  g-'fy)  =  \/b  err1 
(2y  —  1).  Using  this  equation,  our  expression  for  the  initial 
positive  input  interval  endpoint  of  an  A'  output  level  quantizer 
isx,x  =>/6  erf"1  (2(A/2  +  1)  -  1). 

The  X-approximation  requires  us  to  solve  the  equations 
(using  a  standard  Newton- Raphson  search) 


x ,  x  - - for  A  even 

AX(*,x) 


1 


2AX(*,x) 


for  A  odd 


300 


IEEE  TRANSACTIONS  ON  COMMUNICATIONS.  VOL.  COM- JO,  NO.  1.  JANUARY  1981 


Fig.  1. 


Pt  (solid  line)  and  Pk  (doited  line)  plotted  u  a  function  of  N 
for  the  Gauaaian  denhty. 


Fig.  2.  Pf  (solid  line)  and  P\  (dotted  Une)  plotted  as  a  function  of  N 
tot  the  Leplecien  density 


where 


Since  Max  tabulated  the  actual  values  of  the  input  interval 
endpoints,  we  may  compute  the  quantities 


MR 


*1-—  X 


act 


‘set 


and 


*1X  *act 


*act 


Fig.  3.  Pf  (solid  line)  and  P*  (dotted  line)  plotted  as  a  function  of  (V 
for  the  Rayleigh  density. 


for  various  values  of  N  where  xacl  is  the  actual  tabulated 
value. 

In  Fig.  1  we  see  Pt  (solid  line)  and  P\  (dotted  line)  plotted 
as  a  function  of  N  for  values  of  N  from  5  to  36.  As  may  be 
seen  from  the  Figure,  the /-approximation  is  better  for  all  these 
vaues  of  N.  Furthermore,  the  X-approximation  does  not  have 
a  solution  for  N  =  4,  which  is  an  additional  drawback  of  using 
this  approximation  in  low  N  regions. 

We  now  perform  the  same  computations  for  the  Laplacian 
( p{y )  -  exp  {— 1>  I }/2)  and  Rayleigh  (p(y)  =  y  exp  {-y1/ 
2})  probability  densities.  In  Fig.  2  we  plot  Pf  (solid  line)  and 

(dotted  line)  for  values  of  N  from  5  to  1 6  for  the  Laplacian 
density.  Again,  the  /-approximation  is  best  for  all  values  of  Pf 
and,  furthermore,  the  X-approximation  has  no  solution  when 
/V  =  4. 

In  Fig.  3  we  see  plots  of  Pt  (solid  line)  and  /\  (dotted  line) 
for  values  of  N  from  2  to  36  for  the  Rayleigh  distribution.  For 
every  value  except  N  =  2,  the  /-approximation  is  better  than 
the  X-approximation.  The  plot  of  Pt  is  noisy  because  calcula¬ 
tion  of  X|f  for  this  density  required  a  large  numerical  if  tegfa- 
tion  which  was  very  sensitive  to  the  number  of  samples  used  in 
the  summation. 

We  should  note  that  Max  quantizers  have  been  computed 
for  the  Rayleigh  and  the  Gaussian  densities  using  both  xu  and 
x)(  as  the  estimate  for  the  initial  interval  endpoint.  With  no 
convergence  problems,  quantizers  of  10000  and  200  output 


levels  have  been  computed  for  the  Gaussian  and  Rayleigh 
probability  densities,  respectively.  In  practice,  we  find  that 
both  methods  give  sufficiently  good  estimates  to  allow  quick 
convergence  to  the  correct  quantizer.  A  typical  value  is  200 
iterations  for  a  1000  level  Gauaaian  quantizer  with  the  last 
level  specified  to  10~s  accuracy.  We  conclude  that  the  xlg 
estimate  is  a  better  approximation  in  most  cases,  but  thex(X 
estimate  is  often  substantially  easier  to  compute. 

REFERENCES 

( 1 1  I.  Max, '  ‘Quantizing  for  minimum  distortion,"  IRE  Trass  Inform. 
Theory,  vol.  IT-*,  pp.  7-12.  Mar.  I960. 

[2]  N.  C.  Gallagher.  "Optimum  quantization  in  digital  holography." 
Appl.  Opt.,  vol.  17,  pp.  109-113,  Jan.  I,  197*. 

[3]  M.  D.  Pacz  and  T.  H.  Glisaon,  "Minimum  mean-squared  error 
quantization  in  speech  PCM  and  DPCM  systems,"  IEEE  Traru 
Commas.,  vol.  OOM-20,  pp.  223-230,  Apr.  1972. 

(4J  W.  C.  Adams  and  C.  E.  Giesler,  "Quantizing  characteristic  (or 
signals  having  Laplacian  amplitude  probability  density  function." 
lEEETrans.  Commas.,  vol.  COM-26,  pp.  1293-1297.  Aug.  1978 

15)  W.  A.  Pearl  man  and  G.  H.  Senge,  "Optimal  quantization  of  the 
Rayleigh  probability  distribution,"  IEEE  Trass.  Commas  .  vol. 
COM-27,  pp.  101-t  12.  Jan.  1979. 

(6)  B.  Smith,  "Instantaneous  companding  of  quantized  signals,"  Bell 
Sysl.  Tech.  1.,  vol.  36,  pp.  653-709,  May  1937. 

(7)  E.  Oelp  and  J.  A.  Buck  lew,  mutual  correspondence,  1977. 

(8)  P.  E.  Finishes,  "Sufficient  conditions  for  achieving  minimum 
distortion  in  a  quantizer,"  in  IEEE  Ini  Couv.  Rtc  .  1964.  pp 
104-1 1 1. 


IEEE  TRANSACTIONS  ON  COMMUNICATIONS.  VOL.  COM-JO.  NO.  I,  JANUARY  1982 


|9|  P  F.  Punier  and  W  One.  "Quantization  distortion  in  pulse-count 
modulation  with  nonuntlorm  spacing  of  levels."  I’ mi  IKti.  vol. 
19.  pp  44-48,  Jan  1931. 

(IU|  C.  M  Roe.  "Quantizing  for  minimum  distortion."  lilt'  Trans. 
Inform.  Theory .  vol.  IT- 10,  pp.  384-383,  Oct.  1964. 


Mi  l  I  HANSACi  IONS  ON  INFORMATION  IHfcORY,  VOL.  If- 28.  NO.  2.  MAKl  H  IVK2 


}6t» 


On  the  Design  of  Nonlinear  Discrete-Time  Predictors 

T.  E  McCANNON.  memm*.  use.  NEAL  C  GALLAGHER. 
memee*.  IEEE.  D  MINOO-HAMEDANI.  and 
GARY  L.  WISE,  menses,  ieee 

Abstract— The  problem  of  sM— i  smsb  lent  error  prediction  of  * 
discrete-time  random  proem  mint  o  nonlinear  fiber  consisting  of  a 
rero- memory  nonlinearity  Mwnf  by  a  Naoar  filter  is  studied.  Glosses  of 
random  proemn  for  which  dm  beet  predictor  b  reduabte  ttsing  a  oonbn- 
ear  fiber  of  I  he  above  font  at  dhenmod.  For  Ihoae  random  processes  for 
which  the  best  predictor  is  not  reaHtaMe  mdap  the  above  soldi  nr  ar  fiber, 
an  iterative  procedure  is  preteatod  far  findiai  a  mb  optimal  aonbnaar  fiber. 

Manuscript  received  November  IT.  I9H0  This  work  wan  Happened  in  pwn 
bv  the  Air  Force  Office  of  Scientific  Research  under  Grants  A  FOSS  7H-M03. 
AFOSR  76- 3062.  AFOSS  HI-0047,  and  AFOSR  76-3062.  and  in  part  by  the 
Department  of  Defense  Joint  Services  Electronics  Propram  under  Grant 
F4U62-77-COIOI 

T  E  McCannon  and  N  C.  GaHaphn  .re  snth  the  School  of  Electrical 
Lnpmcennp.  Purdue  Uroveruty.  Weal  Lafayette.  IN  47407 

G  W.  Wise  is  with  the  Department  of  Ekctncal  Enpinccnnp.  University  of 
Tesas  at  Austin.  Austin.  TX  7»7|2 

D  Mtnoo-Hamedani  was  snth  the  Department  of  Electrical  Enpinccnnp. 
University  of  Tesas  He  is  now  with  bed  Laboratories.  Hoimdti.  NJ  0773) 

00 1  K-944H/  H2 /0300-03b6$00. 75  ©1982  IEEE 


IU1  I K  ANSAC 1  IONS  ON  INtoRMATION  THEORY.  VOI .  II-2X.  Nil  2.  MARCH  IVX2 


M- 


special  attention  is  direeled  to  the  case  where  the  nonlincaril)  is  a 
polynomial.  Also,  a  noniterative  approach  based  on  nunliut-ar  regression  is 
presented. 

i.  Introduction 

In  this  correspondence  we  consider  a  second-order  random 
process  { »„,  n  -  1.2,  ).  and  we  are  interested  in  predicting 

the  random  variable  \\  .  ,  from  an  observation  of  .V,.-  ■  .  A\ 
Our  estimate  is  denoted  by  Xs  , and  we  wish  to  choose  it  so  as 
to  minimize  the  mean -  squared  error. 

It  is  well-known  |l.  pp.  77—78]  that  the  optimal  estimate  of 
A\  ,  ,  in  terms  of  A',.  .  Vv  is  given  by  the  conditional  expecta- 

tion 

*s-.  K[Xs,t\Xs.--  ,.V|) 

In  general,  this  is  a  Borel  measurable  function  of  V,.  -  -,.Y%.  In 
many  cases  an  exact  expression  for  this  quantity  is  difficult  to 
obtain  Often  we  do  not  have  the  necessary  statistical  information 
to  evaluate  such  a  quantity.  In  such  cases,  we  might  restrict  the 
form  of  the  estimator  to  be  linear  and  apply  well-known  tech¬ 
niques  ( 2 1  for  its  determination.  Linear  estimation  can  also  be 
thought  of  as  applying  the  projection  theorem  II.  pp.  ISO  I5S| 
and  projecting  Xs  ,  ,  onto  the  linear  manifold  generated  by  the 
observations  X,.-  ■ .  Av.  Clearly,  in  this  case  the  only  statistical 
information  required  is  the  second-moment  characteristics  of  the 
random  process. 

In  an  attempt  to  improve  estimation  performance,  we  propose 
to  modify  or  augment  this  subspace  so  as  to  have  a  larger  signal 
component  present  within  the  subspace.  A  linear  method  cannot 
alter  the  subspace  in  the  manner  required  to  achieve  the  desired 
behavior:  however,  a  nonlinear  system  can  modify  the  subspace 
So.  we  begin  by  restricting  our  estimate  Xs  ,  ,  to  be  of  a  form  that 
is  expressible  as  the  output  of  a  system  consisting  of  a  time-in¬ 
variant  zero-memory  nonlinearity  (ZNL)  followed  by  a  linear 
filler  The  ZNL  is  characterized  by  a  Borel  measurable  function 
X(-)  such  that  x(.V,i.  .  <<  A\  )  are  second-order  random  vari¬ 

ables  We  can  now  form  our  estimate  of  \\  .  ,  as  a 
linear  combination  of  the  Xt )  by  projecting  \\  ,  onto  the 
linear  manifold  generated  by  the  modified  observations 
XI  A',).  ■  ..if(  Xy )  If  the  weighting  sequence  of  the  linear  filler  is 
given  by  1 1,,.  -  ■  -  ,hy  ,.  then  the  estimate  is  given  hv 
v 

-Vv.  ---  2  *(*.)* v  „  (I) 

ft  l 

We  w  ish  to  determine  a  function  ■ }  and  a  set  of  coefficients 
li„.  .hy  ,  so  that  the  resulting  mean-squared  error  is  mini¬ 
mized  and  is  at  least  as  good  as  that  of  the  optimal  linear  filler. 
Similar  system  structures  have  been  employed  in  certain  detec¬ 
tion  applications  |3) 

We  note  that  the  purpose  of  the  linear  filter  is  siniplv  to 
implement  the  projection  operation.  The  purpose  of  the  ZNL  is 
to  modifv  the  observations  in  such  a  way  that  the  resulting  linear 
manifold  contains  a  large  component  of  A\  , ,.  so  that  the  error 
associated  with  the  projection  is  small.  We  note  that  in  working 
with  a  nonlinear  system  of  this  type,  no  statistical  knowledge  of 
the  random  process  hevond  that  contained  in  the  family  of 
bivariate  distributions  ■'  ever  required.  In  some  eases  even  less 
statistical  knowledge  suffices.  For  example,  if  the  ZNL  is  chosen 
to  be  a  polynomial,  then  the  required  statistical  knowledge  of  the 
random  process  reduces  to  the  family  of  certain  |oinl  higher  order 
moments  of  the  random  process 

Since  we  know  onlv  the  second-moment  characteristics  of  the 
random  process,  the  widest  class  of  systems  over  winch  we  could 
opiiini/c  is  the  class  of  linear  systems.  Thus,  to  do  better  than  is 
possible  using  hneai  prediction,  we  must  have  moic  statistical 
knowledge  of  the  random  process  than  its  second-moment  char¬ 
acteristics  Therefore,  since  the  ZNL  serves  the  purpose  of  mod¬ 
ifying  the  closed  linear  manifold  onto  which  A\  .  ,  is  projected. 


and  since  the  resulting  prediction  scheme  never  requires  siuUsii 
cal  knowledge  of  the  random  process  bevond  that  contained  n 
the  family  of  bivariate  distributions,  a  nonlinear  predictor  of  tins 
type  seems  reasonable. 

In  Section  II  we  consider  some  eases  where  the  optimal  csii 
mate  has  the  form  of  (I).  In  the  general  ease,  the  optimal 
predictor  will  not  have  the  form  of  (I),  and  thus  a  predictor  ot 
this  form  will  he  .suboptima).  This  situation  is  discussed  in 
Section  III.  where  an  iterative  scheme  is  presented  foi  dcicunin 
mg  suboptimal  predictors.  In  Section  IV  examples  are  given  to 
illustrate  the  method  Finally,  in  Section  V  a  noniterative  ap 
proaeh  utilizing  a  modified  ZNL  structure  is  considered 

fl.  Optimai  Pki-.dic i  ion 

In  this  section  we  consider  some  eases  where  the  optimal  Idler 
has  the  form  of  <l)  Whenever  the  optima)  filter  is  linear,  then  h 
obviously  has  the  form  of  (I)  with  x<  v )  -  v.  The  class  ,.| 
spherically  invariant  random  processes  |4|  admits  linear  soluiions. 
with  the  most  well-known  examples  being  the  Gaussian  pnvesscs 

It  is  clear  that  the  performance  of  the  filter  given  bv  1 1 1  ,.m 
always  be  made  at  least  as  good  as  that  of  the  optimal  linear 
filter.  In  some  cases  the  filter  given  by  ( I)  can  be  optmui  while 
the  optimal  linear  filler  is  useless  For  example,  lei  \,  /’if  t 
where  U  is  a  random  variable  uniformly  distributed  over  j  I.  ij 
and  /*„(•)  is  the  nth  Legendre  polynomial  |5]  In  this  ease,  the 
sequence  {.\\,  ii  -  1.2,  •  •  ■  }  is  a  sequence  of  uneorreljted  zero 
mean  random  variables,  and  the  optimal  linear  filter  vields  an 
estimate  which  is  zero.  However,  for  i>(  v )  -  /\  ,t  \  i  and 

,  -  /  1  •  w  =  \  I 
"  10,  ii  *  N  I . 

the  filler  of  (I)  gives  the  estimate  A\  .  ,  --  A\  .  ,  Numerous 
examples  similar  to  this  can  easily  be  constructed. 

When  the  process  is  a  (first-order)  Markov  process,  u  is  well 
known  1 1,  pp  SI  -S3)  that  t{Xv, ,)  X%,- ■  ..V,)  /:( A\  \v) 

with  probability  one  (wpl).  Thus  a  system  of  the  form  of  ( I)  with 
a  ZNL  given  by  g(x)=  i{A\,,i  A\  -  x)  and  a  weighting 
sequence  given  by 


will  yield  the  optimal  estimate  of  A\  .  , 

Markov  processes  serve  as  models  for  many  physical  phenom¬ 
ena  that  arise  in  practice.  Often  they  are  obtained  as  tin-  solution 
of  first-order  stochastic  difference  equations  of  the  form 

A., .  i  -  ,X(  A'J  +  >i  0.1.2.  . 

where  ,i*(  ■ )  is  a  Borel  measurable  function  and  the  sequence  ,  /,  ; 
is  a  sequence  of  zero-mean  independent  random  variables  ih.it 
are  independent  of  the  initial  condition  A',,  It  is  easilv  seen  that 
in  this  ease  we  will  have  A. j  ,  j  A\ .  .  A',)  y’t  A  \  I  wpl 

Clearly,  for  anv  random  process  for  which 

v 

/-{  Vs  ,  i  i  A\  .  •••.  A',  J  -  2  ■>’(  ' „)/iv  ,  »pl.  (2) 

n  I 

a  system  of  the  form  of  (I)  will  produce  the  optimal  estimate  of 
•Y\  .  |  As  another  example  of  a  process  for  which  the  conditional 
expectation  has  the  form  of  (2).  consider  the  process  genet  a  led  In 
the  following  second-order  stochastic  difference  equation 

.  .■  *„/?(  A’.,.  ,)  -i  /),*(  A"., )  *  /„ . 

ii  1,0. 1.2.  (i| 

where  x(  • )  is  a  I  ford  measurable  function  and  |  /,  ]  is  .i  sequence 
of  zero  mean  independent  random  variables  independent  ot  i!h 
initial  conditions  V  ,  and  A„  It  can  be  easily  seen  ilui  toi  iliis 
example,  for  an  V  ■  2. 

I'{  .  i  i  Xy  .  ..V,}  -  A„x(  Xy  )  »  fc,x(  Vs  ,1  wpl 


4 


36* 


II  I  I  IKANSAt  IHMVsrm  IMOMMAIION  rlltOKV.  VOI  II -2X.  NO.  2.  MAKI  II  Is) A 2 


Exicnsion  of  this  example  to  the  case  where  (3)  is  a  A.  th  order 
stochastic  difference  equation  is  obvious. 

III.  Suboptimal  Prediction 

In  the  general  case  there  will  not  exist  a  function  i>(  )  and  a 
weighting  sequence  fi„.  •  -,AV  ,  such  that  (2)  is  satisfied.  How¬ 
ever,  n  is  quite  reasonable  to  conjecture  that  in  many  cases  it  may 
be  possible  to  determine  a  filter  having  the  form  of  (I)  with  a 
mean-squared  error  either  significantly  smaller  than  that  associ¬ 
ated  with  the  optimal  linear  filter  or  very  close  to  the  mean- 
squared  error  associated  with  the  optimal  filter 

If  the  function  i>(  I  that  minimizes  the  mean-squared  error  is 
known,  the  mX„)  will  be  well-defined  random  variables  and  the 
determination  of  the  /i„  that  minimize  the  mean-squared  error 
reduces  to  an  application  of  the  projection  theorem  that  is, 
setting 

f-.j  -Vs.,  -  £  *s  .*<*.)  K(*,)J  ■=«.  /  I. 

(4) 

and  solving  for  the  h„  To  carry  out  this  step  we  need  to  calculate 
the  terms  /;'{ g(  X„  )jj(  )}  and  /;'{ Xs  .  ,#(  Xt ))  In  practice,  the 
determination  of  the  function  g(-)  that  minimizes  the  mean- 
squared  error  is  a  difficult  problem. 

Notice  that,  in  the  optimization  problem  where  the  filter  is 
constrained  to  be  of  the  form  in  ( I ),  only  second-order  informa¬ 
tion  (i.e.,  the  family  of  bivariate  distributions)  is  required.  This  is 
more  statistical  information  than  would  be  required  if  we  were 
doing  optimal  linear  filtering,  which  requires  only  second-mo¬ 
ment  information.  However,  it  is  still  considerably  less  statistical 
information  than  would  be  required  if  we  were  doing  optimal 
filtering,  which  requires  statistical  information  pertaining  to  an 
<  <V  -*  I  l-dimensional  distribution. 

In  order  to  circumvent  the  difficult  problem  of  determining  the 
function  x<  )  to  use  in  (I),  we  will  sacrifice  some  degree  of 
optimality  and  parameterize  #<  ).  thus  letting  the  determination 
of  x(  >  simply  depend  upon  finding  the  correct  parameters. 
Doing  so,  we  then  write  the  resulting  mean-squared  error  as  a 
function  of  the  parameters  associated  with  #(  •  |  and  the  weighting 
sequence  of  the  linear  filter.  In  this  case,  the  mean-squared  error 
would  be  a  function  of  K  +■  N  parameters,  where  K  is  the  number 
of  parameters  associated  with  #<  )  For  example,  let  ,i>(  )  be  given 
by 

A 

#(•*)  =  2 
/  l 

I  lu  n  « mi  i'nIiiimH*  in  fciu'n  by 

*>■>  £  £** 
i/i 

and  the  resulting  mean-squared  error  is  given  by 

M l 

I  {I  Vs.,]’}  2  2  l  h  s  <*„>} 

n  I  |  I 

%  S  A  A 

•  2  2  2  2  a*  MM 

«i  I  nr  I  /  I  A  I 

(5) 

Ihc  functions  hi  ■ )  should  be  determined  so  that  there  is 
considerable  flexibility  in  the  functional  form  of  i>(  >  and  also  so 
that  the  expectations  in  (5)  could  be  determined  from  the  statisti¬ 
cal  information  at  hand.  For  example,  if  h,(  v )  « then  the 


necessary  statistical  information  would  consist  of  the  higher  order 
joint  moments. 

The  next  step  might  be  to  minimize  (5)  over  the  \  ■  A 
parameters  This  would  result  in  N  +  K  equations  of  third-order 
polynomials  in  the  parameters.  This  simultaneous  optimization 
over  all  the  parameters  presents  potential  numerical  problems  As 
an  alternative  to  the  simultaneous  optimization  over  all  the 
parameters.  we  describe  an  iterative  technique. 

The  basic  plan  of  the  iterative  technique  is  to  consider  the  two 
sets  of  parameters  separately  and  to  iteratively  optimize  over  one 
set  of  parameters  while  holding  the  other  set  fixed.  This  iterative 
technique  results  in  the  need  to  solve  systems  of  linear  equations, 
as  opposed  to  the  need  to  solve  systems  of  equations  in  third-order 
polynomials  such  as  eneountcied  in  the  effort  to  simultaneously 
optimize  over  all  the  parameters. 

We  will  assume  that  the  parametric  form  of  #(  >  is  such  that 
with  the  proper  choice  of  parameters  we  could  have  x(  a  )  -  »  In 
this  way  the  mean-squared  error  that  results  will  always  be  upper 
bounded  bv  the  mean-squared  error  associated  with  the  optimal 
linear  filter 

The  iterative  technique  is  as  follows 

Step  I:  Determine  the  optimal  weighting  sequence  h,„ 

It  s  |  for  the  case  where  x(  a  )  -  a  . 

Step  2:  For  this  choice  of  h„,-  ■  ■ ,hs  ,.  determine  a,.-  ■ 
so  as  to  minimize  the  mean-squared  error. 

Step  3:  For  this  choice  of  ■  ■  ,ak.  determine  the  optimal 
weighting  sequence  h,„-  ■  .h  k  ,. 

Step  4:  Repeal  Steps  2  and  3  until  the  improvement  in  the 
mean-squared  error  is  negligible. 

At  each  stage  of  execution  the  algorithm  provides  a  system  design 
whose  mean-square  estimation  error  is  no  larger  than  that  for  the 
previous  step  of  the  algorithm. 

The  a,.  ,uk  and  fi„,  ,hs  ,  that  are  obtained  in  Step  4 
after  the  termination  of  the  iterations  determined  the  system. 
Step  I  and  Step  3  make  use  of  the  projection  theorem  and  result 
in  £(*„.  ,«(*,))  =  i:  ,hs  „£{*( *„)*(*,)}.  j  -  I.  ,.V. 
Step  2  makes  use  of  (5)  and  results  in 

N  N 

2  2*v  >v  I 

*1-1  /  I 

2„,/:{/>, (*,)/,,(*,)}  +  J 

P -  ‘ 

P+l 

V 

:  ■  2  *»  «M  *v .  iM  j-  o.i.  -.a 

rt  I 

IV.  liXAMPt  IS 

In  this  section  we  consider  a  particular  parametric  form  for  the 
/NL  and  a  specific  model  for  the  random  sequence.  The  iterative 
method  described  earlier  is  used  in  this  case  to  determine  a  filter 
of  the  form  of  (I).  We  also  determine  the  mean-squared  error 
resulting  from  use  of  the  optimal  filter  and  that  resulting  from 
use  of  the  optimal  linear  filter.  Performance  results  for  these 
fillers  arc  compared,  and  it  is  seen  that  in  several  instances  the 
improvement  in  mean-squared  error  of  the  suboptimal  filter  over 
that  of  the  optimal  linear  filter  is  a  significant  traction  of  the 
corresponding  improvement  of  the  optimal  filler  over  that  of  the 
optimal  linear  filter. 

Assume  that  we  have  knowledge  of  the  regression  function  for 
stalionary  { .V„ ) : 

i" (  a  )  —  £' {  Xs .  1 1  X h  —  a  )  (f>) 

Notice  that  if  we  choose  x(  x)  -  r(  a  )  and 


4 


1111  IRANSAt  HONS  ON  INFORMATION  IMLOHY.  VOI  1 1 -2K.  NO.  2.  MART  II  19X2 


then  the  estimate  would  he  the  same  as  that  of  the  optimal  filter 
based  on  the  most  recent  observation.  If  we  were  to  use  the 
projection  theorem  to  choose  a  different  weighting  sequence 
!  h„ ).  we  might  do  better.  It  seems  reasonable  to  expect  that  if  we 
were  to  parameterize  ,<(  ■ )  so  that  by  proper  choice  of  the  parame¬ 
ters  we  would  have  el  x)  -  r(x).  and  then  to  use  this  parameteri¬ 
zation  of  the  ZNL  in  the  iterative  technique  described  earlier,  wc 
might  determine  a  system  of  the  form  of  (I)  exhibiting  very  good 
performance.  This  is  how  wc  will  choose  the  ZNL  in  this  section. 

As  a  model  for  the  random  sequence  (A1,,  n  =  1.2.  •  •  • }  we 
assume  that 

=  (7) 

where  [7.,,.  n-  1,2.  •  )  is  a  zero-mean  stationary  Gaussian 
process  with  unit  variance  and  autocorrelation  function  p<  ). 
I  irst  we  derive  an  expression  for  the  regression  function  (6)  when 
the  random  sequence  is  given  by  (7).  Using  results  in  [6],  we  have 
that 

*{*%.. I*s)  *  £{Uv.,)2v''|Z„} 

*  2  IpO)]"M.(Zs) 

n  -0 

=  2  MODiAtt**)1 

A=0 


where  the  series  are  mean-square  convergent,  the  constants  (h„ j 
are  given  by 


h- "  I'j**2" "#-(*)o‘p(  t  )J'' 

and  D„  is  the  nth  normalized  Hermite  polynomial  given  by 

}  ~  ’  jnT~  “'I  T  )  dE"  C*P(  ~f  ) 


(») 


We  see  from  (X)  that  /)„  =  0  for  n  >  2q  +  I  and.  in  fact,  the  b„ 
can  he  obtained  from  the  relation 


(0‘yM=  2  b„en(x) 
For  example,  for  q  I  we  have 


[3. 

n  =  1 

/>„  - 

■  A- 

n  -  3 

10. 

n  1,3 

and  r(  v )  is  given  by  r{x)  =  (p(l)| 

’a  +  3p(l  I  ( 

l  or  (/  -  2. 

15. 

n  =  1 

h 

IOy/6 

n  3 

to 

r-i 

n  -  5 

[o. 

n  *  1,3,5. 

tliui 

r(  '  )  I *>(!>)%  r  I0[ p(  1 )] '( 1  - 

(p<  ')]•>’ 

+  I5P(I)(I  Ip(I)|-’)’v'  \ 

In  ccucral.  for  an  arbitrary  positive  integer  q.  it  is  easilv  seen  that 
a  i  has  the  form 


M  >  ) 


,<  -*  > 


r  J! «/  1 1  I  2q  •  1 1 


+  I' 


,<  A  I'  '' 


I 


i ■  H 


where  the  c,  are  constants  that  can  be  determined  using  the  above 
procedure.  Thus  we  choose  the  ZNL  g(  ■ )  to  be 

q+  I 

#(*)  =  2  ".(*) 

i=t 


where  the  parameters  a,  are  to  be  determined  by  the  iterative 
procedure.  In  utilizing  the  iterative  procedure  we  encounter  the 
need  for  the  knowledge  of  moments  and  joint  moments  of  (  /„ ) 
(see  [7]).  which  are  given  by 

rit7  \e\  _  f  I  '  3  •  5  ••  ■  (p  -  I)  for  p  even 

^  m)  >  [o.  forpodd 

=  M(r.s.t) 

(r  +  s  -  l)p(i)p(r  -  l.s  -  l.f)  +  (r  -  1)(.«  -  I) 

=  •  ■  ( I  -  [p(i  )]J)p(r  —  2,  s  -  2, /).  for  ( r  +  s )  even 

0,  for  (r  +  s)  odd. 


W 

Observing  that  p(l,  I. /)  =  p(/)  and  p(2,2.  i)  =  I  +  2(p(/)|;,  all 
higher  order  joint  moments  can  be  calculated  using  (9) 

In  order  to  compare  the  performance  of  the  suboptimal  estima¬ 
tor  with  that  of  the  optimal  estimator,  we  have  obtained  expres¬ 
sions  for  the  mean-squared  error  associated  with  the  optimal 
estimator.  For  the  optimal  system  we  arc  interested  in 

£{(  Ziv.)2**' I Z*.  •.*,}• 

Notice  that  this  is  the  (2 q  +  I)  conditional  moment,  and  the 
conditional  distribution  has  the  functional  form  of  a  Gaussian 
distribution.  Thus  the  minimum  mean-squared  error  follows  using 
standard  properties  of  the  Gaussian  distribution  (see.  for  exam¬ 
ple.  [X]).  For  q  =  I  we  find  that  the  minimum  mean-squared  error 
is  of  the  form  15  -  P?[9E{ F2 )  +  6P,£(F4)  +  P2£{  *"■));  and 
for  q  -  2,  the  minimum  mean-squared  error  is  of  the  form 
V45  -  />‘(225  £{F2}  +  300  P,E{Y4)  +  130 /*,*’£{  Y*)  f 
20/*'£(  K*)  +  F,‘£{y,u)].  In  these  expressions  P,  is  a  constant, 
and  Y  is  a  normal  random  variable  with  zero  mean  and  variance 
y:.  The  constants  P,  and  yJ  are  defined  as  follows.  Assume 
without  loss  of  generality  that  the  correlation  matnx  R  associated 
with  Z •  -.Zv,  |  is  positive  definite  (if  it  is  not.  the  data  can  be 
reduced  to  achieve  this  result).  Then  P,  is  the  reciprocal  of  the 
element  in  the  lower  right  comer  of  R  Denote  the  first  ,V 
elements  in  the  last  row  of  R ' 1  as  r,  .  rv  Then 

r  =  2  (O2  +  2  2  2  rv  ib,,  ,.  |P(,V  -  m). 

•  1  m- 1  n -  I 

The  mean-squared  error  associated  with  the  optimal  linear  filter 
can  now  be  obtained  in  a  straightforward  fashion 

In  Tables  I- VIII  results  are  presented  comparing  the  subopti¬ 
mal  filter  to  the  optimal  filter  and  the  optimal  linear  filler. 
Several  correlation  sequences  for  ( Zv )  are  considered,  both  the 
third  power  and  the  fifth  power  of  Z„  arc  used  as  models,  and 
examples  for  two  observations  and  five  observations  are  given  In 
these  tables  /.,.  /..  and  /.„„„  are  the  mean-squared  errors  resulting 
from  the  optimal  linear  filter,  suboptimal  filter  using  a  ZNL.  and 
the  optimal  filter,  respectively.  The  quantity  n,  is  the  percent  of 
decrease  in  /. ,  when  the  suboptimal  filter  using  a  Z.NL  is  ein 
ployed,  i.e,  n,  ~  100< /. ,  -  /.)//.,.  The  quantity  n,  is  the  per 
cent  of  possible  improvement  in  /.,  using  the  optimal  filter,  i  c  . 
n.  KKq  /. t  -  The  quantitv  «,  is  the  normalized 

percent  of  improvement  over  the  linear  filter  given  by  the  suhop- 
1 1 mol  filler  using  a  ZNL.  i.e.  *i,  -  l(K)/i,/«.  IIKX/., 

'•>/(/.,  I . „) 


nil  I  RANSAC  IIONS  ON  INTORMA I  ION  THEORY,  VOI  .  11-28.  NO  2.  MARI  II  l*2N- 


I 


TABLE  I 

('ORRELA  I  ION  SEQUENCES  CORRESPONDING  TO  T ABIES  Il-V 


pOJ 

»(2> 

p(l) 

»<4> 

p<5) 

1 

.525 

.45 

.35885 

.291 

2 

.MS 

.2842 

.70262 

.639 

.5805 

J 

.55 

.515 

.182 

.11445 

.07183 

4 

.55 

.395 

.319 

.26445 

.23023 

5 

.42* 

.2525 

.14625 

.09444 

.06207 

6 

.8351 

.6666 

.5 

.3333 

.1666 

7 

.5787 

.2943 

.125 

.037 

.00463 

a 

.4822 

.1975 

.0625 

.0123 

.00077 

TABLE  V 

COEFFICIENTS  </,  OF  ZNL*(.t|  =  u,\  +  UjA,/!'  >  i/,l‘  ‘ 
AND  /i,  OT  SUBOPTIMAI.  SYSTEM  TOR  i)  2 


"o 

h2 

"s 

\ 

*i 

*2 

1 

.4779 

.0119 

.0063 

.0042 

.0052 

4.0527 

3.7727 

.493 

2 

.7065 

.0097 

.0067 

.005 

.0093 

.7136 

2.032 

.7585 

3 

.2563 

.0059 

.0028 

.0017 

.0014 

15.173 

4.5019 

.196 

4 

.2466 

.0472 

.0282 

.019 

.017 

11.733 

4.465 

.1966 

5 

.162 

.0227 

.009 

.0043 

.0026 

23.858 

3.802 

.0839 

6 

.6534 

-.034 

-.0234 

-.0136 

-.024 

2.742 

2.9562 

.6302 

7 

.2864 

-.0184 

-.0096 

-.0054 

.0002 

14.7841 

4.5769 

.2267 

8 

.2032 

-.01 39 

-.0065 

.0019 

.0008 

22.373 

4.2663 

.128 

1 


I 


I 


r 


TABLE  II 

Mean-Squared  Errors  and  Percentages  of  Improvt.mi.ni  tor 
V  I 


S 

L 

L 

•in 

"i 

"2 

"j 

1 

9.1983 

8.8614 

6.8581 

3.6 

3.69 

97.3 

l 

5.1744 

5.0622 

5.3599 

2.16 

2.21 

97.6 

3 

12.5987 

12.1084 

12.108 

3.89 

3.89 

99.5 

4 

12.3196 

11.9216 

11.8952 

3.23 

3.44 

93.7 

5 

13.6849 

15.2957 

15.293 

2.84 

2.86 

99.1 

6 

6.9247 

6.6228 

6.4926 

4.36 

6.23 

69.8 

/ 

12.2903 

11.732 

11.7259 

4.54 

4.59 

98.8 

8 

13.3219 

12.8142 

12.8123 

3.51 

3.82 

99.6 

TABLE  III 

Mean-Squared  Errors  and  Percentages  of  Improvement  tor 
«/  —  2 


Li 

4 

L««n 

"2 

"s 

1 

727.42 

704.58 

704.22 

3.13 

3.18 

98.1 

2 

453.78 

444.76 

444.49 

1.98 

2.04 

96.7 

3 

887.49 

859.95 

859.9 

3.1 

3.1 

99.7 

4 

379.44 

354.59 

851.86 

2.82 

5.13 

89.8 

5 

920.93 

899.7 

499,43 

2.3 

2.33 

98.5 

6 

584.57 

564.58 

550.99 

3.41 

5.74 

59.3 

7 

576.53 

845.86 

845.24 

5.47 

3.54 

97.7 

8 

91J.86 

584.62 

584.42 

2.88 

2.9 

99.2 

TABLE  IV 


c  oefficients </, ot  Nonlinearity x(.y)  =  <i}.«  t  and/i,  of 
Suboptimal  System  for  </  =  I 


"o 

"i 

"2 

"a 

•i 

*2 

1 

.6115 

.0127 

.008 

.0059 

.0094 

1.519 

.6811 

2 

.7899 

.0084 

.0064 

.0051 

.0132 

.674 

.362 

1 

.4026 

.0091 

.0049 

.0029 

.0024 

2.7896 

.4114 

V 

.1654 

.0687 

.043  J 

.0297 

.028 

2.4749 

.4114 

.2827 

.0407 

.0164 

.0076 

.0047 

3.1779 

.2654 

*> 

.776 

.02  14 

-.0175 

-.0111 

-.0662 

1.2015 

.7615 

r 

.4476 

-.024 

-.013 

-.01 

-.0015 

2.775 

.4575 

4 

.1505 

-.0247 

-.0121 

-.0012 

-.0017 

3.342 

.1215 

As  mentioned  earlier,  the  functions  />,(  • )  should  be  determined 
that  considerable  flexibility  exists  in  the  functional  form  of 
I  1  or  example,  if  Yv . ,  has  a  nonzero  mean,  then  choosing 
ie  of  the  />,(  )  to  be  constant  would  enable  the  mean  to  be 
.htracted  out  and  thus  decrease  the  mean-squared  error.  In  this 
;  >c,  for  example.  Step  I  of  the  algorithm  should  be  replaced 
ih  the  following:  determine  the  optimal  weighting  sequence 
,hs  ,  for  the  case  where  g(x)  =  x  +  I.  In  this  case.  Step 
a  ill  result  in  the  best  affine  filter  (i.e.,  linear  plus  a  constant),  as 
i posed  to  the  best  linear  filter 

As  we  also  mentioned  earlier,  the  functions  h,(  )  should  be 
:.isen  such  that  the  expectations  in  (S)  could  be  determined 
>m  the  statistical  information  at  hand.  To  once  again  test  this 
eihod  of  nonlinear  prediction,  we  simulated  the  following  dif- 
icnee  equation  driven  by  white  noise  and  empirically  estimated 
c  necessary  expectations  from  the  simulated  quantities; 

X„  .  |  =  -  1 .74 X~  +  0.005 

nere  the  sequence  ( )  is  a  sequence  of  independent  random 


TABLE  VI 

Correlation  Sequences  Corresponding  to  Tabi  es  VII,  VIII 


£><  1  ) 

•<2> 

1 

.9 

.7 

2 

.8 

.5 

1 

.8 

.3 

4 

.7 

.1 

TABLE  VII 

COEFFICIENTS!/,  OT  ZNL  Xt  V)  =  «,  «  +  u(Y*  1  ANI)  //,  OT 


SuHoniMAL  System  for  </  - 

t 

ho 

N 

•i 

‘2 

1 

1.2377 

-.4974 

.9333 

.32983 

2 

.3837 

-.1001 

1.6639 

.6921 

3 

1.095 

-.646? 

2.3987 

.6089 

4 

.7927 

-.4786 

3.2982 

.4545 

TABLE  VIII 

Mi  an-Squared  Errors  and  Percentages  ot  Improvement  i 

y  -  i 

l 

Sin 

A 

5 

5 

f 

1,76X7 

J.494 

6,79 

M.J 

4J.65 

2 

7.566 

7.0273 

6.7406 

7.12 

10.9 

65.32 

3 

s.rite 

4.371 

1.0231 

24.38 

82.3 

29.62 

4 

S.M2S 

7.1689 

4.9674 

20.19 

44.7 

45.16 

variables  uniformly  distributed  on  (-1/2. 1/2).  Letting  v )  - 
«■„  i  1 1  +  c,  a  *,  we  sec  that  it  is  possible  to  realize  the  best 
predictor  with  a  nonlinear  system  of  the  form  under  considera¬ 
tion.  We  took  N  -  2  and  empirically  estimated  the  expectations 
occurring  in  (5).  After  one  iteration  of  the  algorithm,  the  empiri¬ 
cally  estimated  mean-squared  error  was  reduced  from  0.085  to 
(Mttio.l  I . 


V.  An  Alternate  Design  Approach 

In  the  preceding,  we  considered  an  iterative  procedure  for  the 
design  of  the  nonlinear  predictor.  In  this  section  we  will  consider 
a  genralizalion  of  that  concept  which  results  in  a  noniterative 
procedure.  Recall  that  the  purpose  of  the  ZNL  was  to  modify  the 
linear  manifold  onto  which  Xs  . ,  is  projected.  The  purpose  of  the 
linear  filler  was  simply  to  implement  the  projection  onto  the 
linear  manifold  generated  by  g(X,),-  •  -,g(Afv).  If  the  ZNL  were 
allowed  change,  then  the  possibility  exists  of  choosing  the  ZNL 
such  that  a  larger  component  of  Xv , ,  lies  within  the  linear 
manifold  spanned  by  its  output. 

In  the  earlier  case  with  a  single  ZNL  we  have  sacrificed  some 
degree  of  optimality  by  parameterizing  the  ZNL  and  then  letting 
the  determination  of  x(  - )  depend  upon  finding  the  correct 
parameters.  In  this  situation,  the  mean-squared  error  was  a 
function  of  N  +  K  parameters.  If  we  now  allow  for  ,V  such  ZNL's 
in  the  system,  then  the  mean-squared  er-or  will  be  a  function  of 
<V(  ATI)  parameters.  It  may  appear  at  first  glance  that  we  have 
now  made  the  problem  much  more  complex,  due  to  the  introduc¬ 
tion  of  more  parameters.  Hocver,  os  we  shall  see  shortly,  this 
alternative  approach  will  result  in  a  noniterative  design  proce¬ 
dure 


I 


mi  I  Kansas  iiiinson  imokmaiiok  ihiukv  vih  II-2K.  so.  2.  Mari  ii  IVK2 


With  V  ZNL's  the  estimate  is  given  by 
v 

^  \  •  i  2  .?»(  v  „ 

. i  •  i 

vs  here  V  ZNl.’s  are  given  by 

* 

<«(.»)  =  £  o 

/  i 

In  ibis  ease,  if  we  let  «,l(  =  a„;fi„  v,  then  the  ZNL  #„(•)  could 
be  replaced  by 

A 

«„(*)  "  2  "»/*,<  -<  >• 

/  i 

and  the  linear  filter  eould  be  replaced  bv  an  accumulator,  and  the 
mean-squared  error  will  be  a  function  of  \K  parameters.  In  the 
sequel  we  will  take  this  approach.  Thus  our  estimate  is  now  of  the 
form 

A  A 

Vs  ,-22  «„,M  at„).  <to) 

A  I  i  I 

and  we  wish  to  determine  the  parameters  (a,,,).  The  minimum 
mean-squared  error  estimate  of  this  form  is  given  bv  projecting 
,\\  ,  onto  the  linear  manifold  generated  by  the  NK  random 
variables  {/>,(. V,,)}  Thus  the  parameters  (ti„, }  are  given  as  a 
solution  to 

BA  =  C.  (II) 

where  A  is  a  A'.V-diniensional  column  vector  of  the  parameters 
|u„(}  ordered  lexicographically,  B  is  a  KM  v  KM  matrix  whose 
general  term  is  of  the  form  £{/>,( *,)/>,,(  *„,))  where  the  lexico¬ 
graphic  order  of  i  and  /  denote  the  column  and  the  lexicographic- 
order  of  k  and  m  denotes  the  row.  and  C  is  a  A7V- dimensional 
column  vector  made  up  of  the  terms  E(XS  ,  , A  (  X,,))  ordered 
lexicographically  in  j  and  n.  We  note  that  if  the  parameters  |u„(} 
arc  such  that  ( 1 1 )  is  satisfied,  then  the  resulting  estimate  given  by 
( 10)  is  the  minimum  mean-squared  error  estimate,  and  by  the 
projection  theorem  it  is  uniquely  defined  up  to  probability-one 
equivalence  That  is.  more  than  one  solution  to  (II)  may  exist, 
however,  for  any  number  of  solutions  to  (II),  the  resulting 
estimates  are  all  equal  with  probability  one.  Also,  the  protection 
theoiem  guarantees  that  at  least  one  solution  to  ( 1 1)  exists 
As  a  specific  example,  we  might  choose  />,<  \ )  \'  '.  In  this 

case,  the  matrix  H  will  consist  of  various  moments  and  cross 
moments  of  the  set  of  random  variables 

To  compare  the  two  methods,  we  simulated  the  following 
difference  equation: 

V„  .  0  H7  +  1 .74 X-  (•  0.I3.V.  ,  .»  0  056,,. 

where  the  V„  were  independent  random  variables  uniformly  dis¬ 
tributed  over  |  I  /2. 1/2).  We  set  M  —  2,  A  3.  and  h,(  v )  - 
i  1  The  necessary  moments  and  cross  moments  were  empiri¬ 
cally  estimated  from  the  simulated  quantities.  The  iteration  pro¬ 
cedure  using  a  single  ZNL  yielded  an  estimate  given  by 

X,  0  W)3677g(  X: )  v  0  00350M  .V, ). 

w  here 

y(  v  )  I  t-  0  09741  (tv  I  856364a 

Hie  noniterative  procedure  using  M  ZNL’s  vicldcd  an  estimate 

^.is on  bv 

OK288IO  i  i) 046003 X  i  1 .7420X6 A. 

+  0  132142  V,  o  0X01 10  .V,'. 


If  the  actual  moments  and  cross  moments  had  been  used  in  the 
noniterative  procedure,  then  for  this  example  the  exact  minimum 
mean-squared  error  estimate,  given  by 

.V,  =  -0.87  +  1 .74  X;  +  0.13*,. 

would  have  resulted.  The  resulting  mean-squared  errors  were 
empirically  estimated  from  the  simulated  quantities  and  are  given 
by  0.003503  and  0.000205  for  the  iterative  and  noniterative- 
procedures.  respectively.  The  actual  minimum  mean-squared  er¬ 
ror  for  this  problem  is  2.5/12000  =  0.0002083. 

VI.  Summary 

We  investigated  the  design  of  nonlinear  discrete-time  predic¬ 
tion  filters.  We  motivated  our  approach  through  the  concept  of 
modifying  or  augmenting  the  subspace  generated  by  the  observa¬ 
tions  in  such  a  way  so  as  to  have  a  larger  signal  component 
present  within  this  augmented  subspace.  The  form  of  the  system 
under  study  was  that  of  a  zero-memory  nonlinearity  followed  by 
a  linear  time-invariant  filter  (ZNL-LTI).  We  have  shown  that  in 
many  eases,  where  the  optimum  nonlinearity  is  known,  the  ZNL- 
LTI  structure  produces  nearly  optimum  results.  Finally,  an  exten¬ 
sion  to  the  use  of  several  ZNL’s  was  considered. 

References 

jlj  J  L  Doob.  Si<« has  tic  Process es  Nov  York:  Wiley.  1953 
[2]  T  Kmlalh,  Ed.,  linear  Least-Square  Estimation  Stroudsburg,.  PA 
Dowden.  Hutchinson,  and  Ross.  1977. 

J3|  I.  H  Miller  and  J.  B.  Thomas.  "Detectors  for  discrete- tunc  signals  in 
non-liausstan  noise,"  IEEE  Tram.  Inform.  Theory,  vol  IT  lx.  pp  241- 
250.  Mar  1972 

[4]  J  B  Blake  and  J.  B.  Thomas,  "On  a  class  of  processes  arising  in  linear 
estimation  theory.”  IEEE  Trans.  Inform  Theory,  vol.  IT- 14.  pp  12 -lb. 
Jan  I96K 

[5j  M.  Ahramowitz  and  1.  A.  Stcgun.  Handbook  of  Mathemutuul  Fuintion\ 
Nos*  York  :  Dover.  1964 

|61  Ci  L.  Wise  and  J.  B.  Thomas.  "A  characterization  of  Markov  sequences." 

J  Franklin  Inst.,  vol.  299.  pp.  269-278,  Apr  1975. 

(7f  N  L  Johnson  and  S.  JCoU.  Distributions  m  Statistics:  Continuous  Multi¬ 
variate  Distributions  New  York:  Wiley.  1972.  p.  91 
[K)  K  S  Miller,  Multidimensional  Gaussian  Distributions.  New  York  WiIcn. 
1964,  pp  21-22. 


23: 


I  EKE  TRANSACTIONS  UN  INFORMATION  THEORY,  VOL.  IT- 28,  NO.  2.  MARCH  1182 


The  Design  of  Two-Dimensional  Quantizers 

using  Prequantization 

KERRY  D.  RINES,  member,  IEEE,  and  NEAL  C.  GALLAGHER,  JR.,  member,  IEEE 


A  burn  t — Thr  theoretical  advantages  of  two-4iacnxioiul  quantum  ion 
over  univariate  uuantualiun  have  been  studied  in  the  literature.  However, 
in  many  cases  there  is  no  known  implementation  lor  the  two-dimensional 
quantizer  that  can  operate  in  real  time.  A  new  approach  to  the  design  of 
two-dimensional  quantizers  is  presented.  This  technique,  caBed  prequanti- 
zation,  is  used  to  design  two-dimensional  quantizers  that  operate  in  real 
time.  The  importance  of  prequantization  is  demonstrated  by  the  design  of 
the  optimum  uniform  two-dimensional  (hexagonal)  quantizer.  Additional 
examples  are  given  to  illustrate  the  flexibility  of  this  design  approach. 

I.  Introduction 

THE  USE  OF  two-dimensional  quantizers  for  encoding 
analog  sources  has  been  of  increasing  interest  in  recent 
years.  Two-dimensional  quantizers  can  offer  advantages  in 
the  design  of  both  optimum  and  suboptimum  quantizers. 
These  advantages  may  be  offset  by  the  difficulty  in  imple¬ 
menting  many  two-dimensional  quantizers.  In  this  paper 
we  present  a  new  approach  to  the  design  of  two-dimen¬ 
sional,  quantizers  called  prequantization.  We  show  that  for 
a  number  of  examples  prequantization  simplifies  the 
quantizer  implementation  and/or  improves  the  quantizer 
performance. 

Manuscript  received  Feb.  19,  1980;  revised  March  12,  1981  This  work 
was  supported  by  the  Air  Force  Office  of  Scientific  Research  under  Grant 
AFOSR  78-3605 

K  I)  Rmes  was  with  the  School  of  Electrical  Engineering,  Purdue 
University.  West  Lafayette.  IN.  He  is  now  with  The  Analytic  Sciences 
Corporation.  McLean  Operation.  8301  Greensboro  Drive,  Suite  1200, 
Met. can.  VA.  22102 

N.  C  Gallagher,  Jr  ,  is  with  the  School  of  Electrical  Engineering. 
Purdue  University.  West  Lafayette.  IN  47907 


The  design  of  two-dimensional  quantizers  for  optimum 
quantization  is  one  area  of  interest.  Consider  the  random 
sequence  x,.  x2,  xJ(  •  •  •  where  the  x,  are  all  independent 
and  identically  distributed.  The  traditional  approach  to 
quantizing  this  sequence  is  to  perform  the  quantization  one 
sample  at  a  time  using  a  one-dimensional  quantizer.  Much 
of  the  early  work  in  quantization  theory  has  addressed  this 
problem.  As  a  result  the  design  and  implementation  of 
optimum  one-dimensional  quantizers  is  straightforward.  In 
addition  these  quantizers  are  often  able  to  operate  at  high 
source  rates.  These  properties  make  one-dimensional  quan¬ 
tization  an  attractive  choice  for  quantizing  the  above  se¬ 
quence.  The  advantage  of  quantizing  the  independent  iden¬ 
tically  distributed  (i.i.d.)  sequence  in  two  or  more  dimen¬ 
sions  is  discussed  by  Zador  [1].  Simply  stated,  these  results 
indicate  that  the  minimum  obtainable  per  sample  distor¬ 
tion  decreases  as  the  quantizer  dimension  is  increased. 
Therefore,  the  potential  exists  to  improve  the  performance 
of  digital  encoders  by  replacing  one-dimensional  quan¬ 
tizers  with  two-dimensional  quantizers. 

Zador's  results  include  derivations  of  both  the  upper  and 
lower  bounds  on  the  distortion  obtained  when  using  an 
optimum  quantizer.  Unfortunately,  these  results  do  not 
provide  insight  into  the  structure  of  the  quantizer.  The 
design  and  implementation  of  optimum  two-dimensional 
quantizers  remains  a  largely  unsolved  problem.  Recently 
the  design  of  two-dimensional  quantizers  has  been  ad¬ 
dressed.  Computer  algorithms  for  designing  optimum 
quantizers  of  two  or  more  dimensions  have  been  presented 


0018-9448/82/0300-0232$00.75  ©1982  IEEE 


RISES  AND  OALLAGHEk.  JR.:  DESIGN  OF  TWO-DIMENSIONAL  QUANTIZERS 

by  many  authors,  such  as  Linde  et  al.  [2).  The  algorithms 
specify  the  optimum  set  of  output  vectors  for  the  quantizer. 
The  optimum  quantizer  can  then  be  implemented  using  a 
search  procedure.  Having  specified  '.he  output  set,  the 
search  is  used  to  choose  the  output  vector  that  is  the 
smallest  distance  from  the  input  vector.  However,  this 
implementation  of  the  optimum  quantizer  may  be  difficult 
or  impossible  to  operate  at  high  bit  rates.  Thus  we  are  left 
with  the  following  dilemma.  We  can  use  a  one-dimensional 
quantizer  that  is  easy  to  implement  and  suffer  a  high  level 
of  distortion  or  we  can  improve  the  distortion  by  using  a 
two-dimensional  quantizer  and  accept  the  difficulties  in  the 
implementation.  To  date  the  easy  implementation  of  one¬ 
dimensional  quantizers  has  outweighted  the  theoretical  ad¬ 
vantages  of  using  two-dimensional  quantizers. 

In  Section  III  we  consider  the  design  of  the  optimum 
uniform  two-dimensional  quantizer.  Gersho  [3]  has  stated 
that  the  optimum  uniform  two-dimensional  quantizer  is  the 
hexagonal  quantizer.  Using  prequantization  we  construct  a 
simple  design  for  the  hexagonal  quantizer  which  can  oper¬ 
ate  in  real  time.  For  our  purposes  we  say  that  a  quantizer 
can  operate  in  real  time  if  the  quantizer  can  operate  at 
approximately  the  same  source  rates  as  a  one-dimensional 
quantizer.  Thus  the  prequantization  design  of  the  hexago¬ 
nal  quantizer  allows  us  to  take  advantage  of  the  perfor¬ 
mance  improvements  available  with  two-dimensional 
quantizers  while  maintaining  the  easy  implementation 
characteristic  of  one-dimensional  quantizers.  This  hexago¬ 
nal  quantizer  design  is  a  significant  result  and  demon¬ 
strates  the  potential  practical  applications  of  prequantiza¬ 
tion. 

The  design  of  suboptimum  two-dimensional  quantizers 
has  also  been  studied  in  the  literature.  This  interest  has 
been  motivated  by  the  numerous  examples  in  which  the 
data  are  physically  generated  in  groups  of  two.  These 
studies  note  the  difficulty  in  designing  optimum  quantizers 
and  explore  the  advantages  of  using  suboptimum  two-di¬ 
mensional  quantizers.  One  example  of  data  that  are  gener¬ 
ated  in  pairs  is  samples  from  a  complex-valued  discrete 
Fourier  transform.  The  design  of  suboptimum  two-dimen¬ 
sional  quantizers  for  the  discrete  Fourier  transform  (DFT) 
ha-  been  studied  by  Pearlman  and  Gray  (4)  and  Gallagher 

[31- 

In  Sections  IV  and  V  we  examine  two  examples  of 
suboptimum  two-dimensional  quantizers.  The  quantizers 
are  then  redesigned  using  the  prequantization  approach.  In 
each  case,  the  addition  of  prequantization  substantially 
reduces  the  mean-squared  error  performance  of  the  quan¬ 
tizer.  These  results  further  emphasize  the  usefulness  of 
prequantization. 

II.  Prequantization 

The  design  of  two-dimensional  qu.  .  izers  using  pre- 
quantization  is  illustrated  in  Fig.  1.  The  design  consists  of 
a  nonlinearity  called  a  prequantizer  preceding  a  two-di¬ 
mensional  quantizer  called  an  output  quantizer  This  de¬ 
sign  approach  is  analogous  to  the  implementation  of  a 


2tt 


Fig  I  Two-dimensional  quantizer  design  using  prcquanu/ation 

quantizer  using  a  search  procedure.  Let  the  quantizer  to  be 
designed  be  described  by  a  partitioning  of  the  input  space, 
where  all  the  input  vectors  contained  within  one  cell  of  the 
partition  are  mapped  to  the  same  output  vector.  The  first 
step  in  implementing  a  search  is  to  define  the  set  of 
allowable  quantizer  output  vectors.  Then  for  each  input 
vector  a  search  is  conducted  to  find  the  output  vector 
assigned  to  that  input  vector  by  the  partitioning. 

Similarly  the  first  step  in  designing  a  quantizer  using 
prequantization  is  to  define  the  set  of  output  vectors.  This 
is  done  using  a  two-dimensional  quantizer  that  is  called  the 
output  quantizer.  Thus  we  must  determine  the  set  of  out¬ 
put  vectors  specified  by  the  quantizer  being  designed  and 
then  build  a  two-dimensional  quantizer  with  that  same  set 
of  output  vectors.  The  problem  of  building  the  output 
quantizer  is  somewhat  simplified  in  the  prequantization 
approach  since  there  are  no  constraints  on  how  the  output 
quantizer  partitions  the  input  space. 

The  second  step  in  the  quantizer  design  is  to  require  that 
for  each  input  vector  the  proper  output  vector  is  assigned 
For  the  quantizer  being  designed,  let  A,  be  an  output 
vector  and  S,  be  the  set  of  all  input  vectors  contained  in  the 
cell  of  the  partition  corresponding  to  A,.  Similarly  A,  is  also 
an  output  vector  of  the  output  quantizer,  and  we  let  T,  be 
the  set  of  all  input  vectors  contained  in  the  cell  of  the 
partition  corresponding  to  A,.  A  nonlinearity  called  a 
prequantizer  is  used  to  map  S,  into  7]  for  all  /.  Thus  the 
prequantization  design  maps  5,  into  A,  by  first  mapping  .V, 
into  T,  with  the  prequantizer  and  then  mapping  T,  into  A, 
using  the  output  quantizer.  This  prequantization  design 
procedure  is  illustrated  with  a  simple  example. 

Consider  the  design  of  the  two-dimensional  quantizer 
shown  in  Fig.  2.  This  quantizer  has  no  significance  other 
than  its  usefulness  in  this  example.  Using  the  prequantiza¬ 
tion  procedure  we  must  first  build  an  output  quantizer  that 
defines  the  same  output  set  as  in  Fig.  2.  The  output 
quantizer  can  be  designed  very  simply  using  two  univariate 
equal-stcp-size  quantizers.  The  partitioning  of  the  output 
quantizer  is  shown  in  Fig.  3.  Having  defined  the  output 
vector  set  with  the  output  quantizer,  we  now  turn  to  the 
design  of  the  prequantizer.  We  observe  that  each  partition 
in  Fig.  2  can  be  mapped  into  the  corresponding  cell  in  Fig 
3  by  letting  y'  =  y  and  x'  =  x  —  A/4.  Thus  the  prequan¬ 
tizer  that  completes  the  design  of  the  quantizer  in  Fig.  2  is 
given  by 


>•’  =  y 


One  advantage  of  using  the  prequanti/ation  design  ap¬ 
proach  is  that  often  the  quantizer  can  operate  in  real  time. 
Again  we  define  a  real-time  quantizer  as  a  quantizer  that 


2)4 


Ibbfc  TRANSACTIONS  ON  INFORMATION  THbORY,  VOL.  11-28,  NO.  2.  MARCH  1182 


Fig  2  Partitioning  of  a  two-dimensional  quantizer 


r* 


Fig.  ].  Partitioning  of  output  quantizer 


can  operate  at  approximately  the  same  source  rates  as  a 
one-dimensional  quantizer.  In  a  number  of  examples  the 
output  quantizer  can  be  implemented  using  a  combination 
of  one-dimensional  quantizers  and  as  a  result  can  operate 
in  real  time.  It  is  also  useful  to  note  that  the  prequantizer  is 
defined  only  as  a  nonlinear  mapping  and  may  or  may  not 
be  a  quantizer.  This  differs  from  the  term  pre-quantizer 
used  in  the  literature  which  refers  to  one  quantizer  preced¬ 
ing  another  quantizer. 

III.  Hexagonal  Quantization 

Gersho  has  argued  that  for  independent  samples  (at  high 
bit  rates)  the  optimum  uniform  two-dimensional  quantizer 
is  the  hexagonal  quantizer.  The  design  of  a  hexagonal 
quantizer  using  prequantization  is  given  here.  First  we 
attempt  to  build  a  two-dimensional  output  quantizer  that 
can  be  easily  implemented  and  operate  in  real  time.  One 
quantizer  meeting  these  requirements  is  a  scaled  version  of 
the  diamond  quantizer  given  below. 

Let  the  inputs  to  the  two-dimensional  output  quantizer 
be  x  and  y.  The  variables  x  and  y  are  first  encoded  into  two 
new  variables  w  and  :  by  the  linear  transformation 

w  =  x  +  j3y 

i  -  x  -  fiy.  (2) 


The  variables  w>  and  z  are  quantized  separately  by  uni¬ 
variate  quantizers  with  a  uniform  step-size  A.  The  outputs 
of  the  output  quantizer  are  then  obtained  using  the  linear 
transformation 


x  =  |(w  +  f) 


(3) 


The  position  of  this  quantizer  in  the  hexagonal  quantizer 
design  is  shown  in  Fig.  4  and  the  partitioning  of  the  scaled 
diamond  quantizer  is  given  in  Fig.  5.  Having  chosen  the 
output  quantizer  as  defined  in  (2)  and  (3).  we  now  turn  to 
the  design  of  the  prequantizer. 

The  prequantizer  must  map  the  hexagonal  region  corre¬ 
sponding  to  each  output  into  a  scaled  diamond  region 
corresponding  to  that  same  output.  Consider  the  hexagonal 
partition  shown  in  Fig.  6.  Assume  x  is  fixed  and  the  pair 
(x,  y)  is  contained  within  a  given  hexagonal  partition.  We 
now  pose  a  question:  does  there  exist  a  value  x'  such  that 
the  pair  (x\  y)  is  contained  within  the  corresponding 
diamond  partition  for  all  values  of  y?  This  approach  is 
illustrated  with  the  following  example.  Let  x  =  x,  as  shown 
in  Fig.  6  and  let  y  be  in  the  range  -A/2/3  to  A/2/3  .  In 
Fig.  6  we  observe  that  the  hexagonal  quantizer  output  will 
be  (0.6)  for  all  input  pairs  in  the  set  {(x,,  y):  yx  <  y  Sr,), 
Similarly  in  Fig.  5  we  observe  that  the  scaled  diamond 
quantizer  output  will  be  (0,0)  for  all  input  pairs  in  the  set 
{(x2,  y)'-  >1  -y  s>2)-  Therefore,  if  /(x,)  =  x2,  the  quan¬ 
tizer  in  Fig.  4  will  behave  like  the  hexagonal  quantizer  for 
all  input  pairs  in  the  set  {(x,.  y):  -A/2/3  <  y  <  A/2/3  }. 
In  fact,  we  can  show  that  the  quantizer  in  Fig.  4  behaves 
like  the  hexagonal  quantizer  for  all  inputs  in  the  set 
{(X|,  y):  -  wS^Soo)  when  /(x,)  =  x2.  Repeating  this 
example  for  all  possible  values  of  x,.  we  obtain  a  prequan¬ 
tizing  function  that  maps  the  hexagonal  region  correspond¬ 
ing  to  each  output  into  a  scaled  diamond-shaped  region 
corresponding  to  that  same  output  The  resulting  prequant¬ 
izer  function  is  given  in  (4). 


/(*) 


A  A  A  A  A 

*2*  n 2  6Sx^n2+ 6 

3x-(2»+l)f,  (4) 

"f  +  f  of  -  f 


IV.  Prequantized  Spectral  Phase  Coding 

Spectral  phase  coding  (SPC)  is  a  robust  suboptimum 
technique  for  coding  a  nonstationary  or  large  dynamic 
range  discrete-time  series  into  digital  form.  SPC  utilizes  the 
discrete  Fourier  transform  and  a  two-dimensional  quan¬ 
tizer  to  obtain  its  robust  characteristics.  The  SPC  algo¬ 
rithms  are  given  here,  while  a  detailed  explanation  of  SPC 
is  available  in  (6).  The  input  is  a  discrete-time  complex-val¬ 
ued  random  sequence  The  spectral  magnitude  Ap 

and  the  spectral  phase  tp  of  the  discrete  sequence  are  given 


HINES  AND  GALLAGHER.  DESIGN  Of  TWO-DIMENSIONAL  QUANTIZERS 


PREQOANTIZER  OUTPUT  QUANTIZER 

Fig.  4.  Prequanttzation  design  for  hexagonal  quantizer  Quantizer  Q  has 
uniform  step-size  A. 


y 


Fig  J.  Partitioning  of  scaled  diamond  quantizer 


below: 


k>:vd-  (v#'} 


M-  I 

* 


(5) 


SPC  encodes  (he  magnitude  and  phase  of  (he  spectrum  by 
forming  the  sequence  { tfy  }2P  ~o  '  given  by 

~  9P  +  Tp 

*,♦*  =  *,  “V  (b) 


where 


and 


S  =  max  Af, 


where  the  maximum  is  taken  overp  =  0.  1 .  ■  .M  -  ).  The 


2W 


IEEE  TRANSACTIONS  ON  INFORMATION  THEORY,  VOL.  IT-28,  NO.  2,  MARCH  1982 


quantized  sequence  {^}  is  transmitted  and  used  at  the 
receiver  to  recover  the  original  discrete  signal.  The  recon¬ 
structed  discrete  sequence  is 


K) 


M-  I 

*i  =  0 


DFT 


-  \  A/-  I 


(7) 


This  equation  can  be  rewritten  in  terms  of  the  quantized 
magnitude  and  phase  components  at  the  receiver 


where 


o'  1  {  f**M«'*'  +  *  '*'))"  \  (8) 

7,  =  !(^  (9) 


Examining  (6)  and  (9)  we  see  that  the  variables  8p  and  yp 
are  quantized  by  a  two-dimensional  quantizer  called  a 
diamond  quantizer.  SPC  utilizes  the  discrete  Fourier  trans¬ 
form  along  with  the  diamond  quantizer  to  code  the  possi¬ 
bly  nonstalionary  random  sequence  {<«„}  into  a  well-be¬ 
haved  uniformly  bounded  sequence  { tpp }.  In  many  cases 
the  sequence  (^)  is  uniformly  distributed  from  zero  to  2ir. 
As  a  consequence  {^}  is  quantized  using  a  uniform  step- 
size  quantizer. 

Since  SPC  is  a  suboptimum  quantizer  we  ask  the  ques¬ 
tion:  does  there  exist  a  prequantizing  function  that  can 
improve  the  SPC  performance?  The  results  from  (4]  and  (7] 
indicate  that  for  polar  quantization  (at  high  bit  rates)  the 
number  of  magnitude  quantization  levels  Nt,  and  the  num¬ 
ber  of  phase  levels  N2,  must  be  related  by 

/If,*  2.61V,  (10) 

for  optimum  performance.  In  SPC,  yp  ranges  from  zero  to 
w/2  and  8p  ranges  from  zero  to  2tr.  Thus  yp  has  only 
one-fourth  the  effective  quantization  levels  of  8p.  If  yp  is 
simply  rescaled  to  range  from  zero  ton,  {dj  cannot  be 
uniquely  recovered  from  the  sequence  {^).  However,  using 
prequantization  the  quantizer  can  be  redesigned  to  mini¬ 
mize  the  mean  square  error  (mse)  on  yp  and  improve  the 
SPC  performance. 

We  begin  by  defining  the  quantization  errors  for  i>p  and 
ar  =  +,-  +r 

af*M  =  'bp+M  ~  00 

Assume  the  quantization  takes  place  using  an  Af-level  equal 
step-size  quantizer.  Then  using  a  Fourier  series  expansion, 
we  can  write 

a,  =  ~jf  2  ^^-sinn/V*,. 

=  “  j}  2  ^^~sin  nN^2-  00 


We  now  define  the  quantization  errors  for  8p  and  yp  as 


d,  =  y,-y,  (,3) 

Solving  for  6p  and  yp  in  (6)  and  using  this  result  with  (9) 
and  (1 1)  in  (13)  we  obtain 

j  +  *r ) • 

d,  =  ~  <*,+*)■  (,4) 


Then  substituting  (12)  into  (14)  and  using  a  trigonometric 
identity  we  can  write 

2  *  f  — l)" 

e,=  2  sin  nN8p  cos  nSyp , 

»=  i 

dp  =  -j;  2  - cos  nN8p  sin  nN yp .  (15) 


Thus  the  mse  on  yp  is 


N1 


nm 


•  £{cos  nN8p cos  mN8p  sin  nNyp  sin  mNyp } .  (16) 


For  a  large  number  of  quantization  levels  N,  the  mse  on  yp 
becomes 


i  A(l  +  £{cos2n\0,}). 
n  ,=  i  n 

From  (17)  we  Find  that  E(dp)  is  minimized  for 

0=0'  =  k—  +  — 

'  >  KN  2N’ 

where 


(17) 


08) 


*  =  0,1,  -.IN-  I. 

Applying  these  results,  we  propose  the  following  coding 
scheme  called  prequantized  spectral  phase  coding  (PQSPC). 
First  obtain  8p  and  yp  as  with  SPC.  The  values  (0^)  are 
then  quantized  with  output  levels  kv/N  +  v/2  N  for  *  = 
0, 1,  -  •  -,2N  -  I.  The  quantizer  output  {$p}  is  then  used  to 
form  the  sequence  {+p)  and  the  rest  of  the  procedure  is 
identical  to  SPC.  Figs.  7  and  8  depict  the  quantization 
region  shapes  for  SPC  and  PQSPC,  respectively. 

In  (7]  SPC  was  compared  with  the  optimum  unit  vari¬ 
ance  Gaussian  quantizer  (O.G.Q.).  We  now  present  a  simi¬ 
lar  comparison  to  evaluate  the  performance  of  PQSPC. 
The  normalized  mse  performances  of  the  optimum  unit 
variance  Gaussian  quantizer,  SPC  and  PQSPC  are  com¬ 
pared  in  Figs.  9  and  10.  All  the  quantizers  have  32  levels  (5 
bits/sample)  and  the  block  size  for  SPC  and  PQSPC  is  64. 
In  Fig.  9  the  normalized  mse  of  the  three  quantizers  with  a 
zero-mean  Gaussian  input  is  given  as  a  function  of  the 
input  variance.  The  normalized  mse  of  the  quantizers  with 
a  zero-mean  Laplacian  input  is  given  as  a  function  of  the 
input  variance  in  Fig.  10. 


RINES  AND  GALLAOHER,  IR.:  DESIGN  Of  TWO-DIMENSIONAL  QUANTIZERS 


2X1 


Fig  7.  Shape  of  quantizing  regions  for  Afe'*'  using  SPC  Fig  K  Shape  of  quantizing  regions  for  Aft',r  using  PQSPC 


Fig.  9.  Comparison  of  normalized  msc  between  optimum  unit  variance 
Gaussian  quantizer.  SPC,  and  prequantized  SPC  with  a  zero-mean 
(iaussian  input. 


In  terms  of  normalized  mse,  PQSPC  offers  an  improve¬ 
ment  over  SPC  of  16.3  percent  for  the  Gaussian  input 
densities  and  16.0  percent  for  the  Laplacian  densities.  The 
improvement  for  nonsymmetric  input  densities  can  be  even 
more  dramatic.  In  the  case  of  the  one  side  exponential 
density  PQSPC  offers  a  47.5  percent  reduction  in  nor¬ 
malized  mean-squared  error  over  that  of  SPC.  A  desirable 
characteristic  of  SPC  is  its  relative  insensitivity  to  a  change 
in  signal  power  or  statistics.  Figs.  9  and  10  demonstrate 
that  PQSPC  shares  this  characteristic.  In  fact,  the  nor¬ 


malized  mse  of  PQSPC  remains  constant  for  any  change  in 
the  signal  variance  and  changes  only  1.4  percent  when  the 
input  statistics  are  changed  from  Gaussian  to  Laplacian. 

V.  Hsufh-Sawchuk  Hoi.ocrams 

The  wide  applicability  of  prequantization  is  further  il¬ 
lustrated  by  considering  an  example  from  computer-gener¬ 
ated  holography.  In  this  section  we  present  the  results  of 
using  prequantization  in  Hsueh-Sawchuk  computer-gener- 


IEEE  TRANSACTIONS  ON  INFORMATION  THEORY,  VOL.  IT-28.  NO.  2.  MARCH  1982 


VARIANCE 

10.  Comparison  of  normalized  msc  between  optimum  unit  variance 
Gaussian  quantizer,  SPC,  and  prequantized  SPC  with  a  zero-mean 
Laplacian  input. 


ated  holograms.  A  detailed  analysis  of  prequantization  in 
Hsueh-Sawchuk  holograms  is  given  in  (9)  and  a  good 
summary  of  computer-generated  holography  is  available  in 
110]. 

The  Hsueh-Sawchuk  hologram  encodes  the  discrete 
Fourier  transform  of  the  desired  holographic  image  into  a 
binary  pattern.  This  binary  pattern  is  then  written  onto  the 
hologram  using  a  pattern  generator  with  finite  resolution. 
The  finite  resolution  of  the  pattern  generator  can  be  mod¬ 
eled  as  a  quantizer.  Thus  the  complex-valued  discrete  Four¬ 
ier  transform  of  the  holographic  image  is  effectively  quan¬ 
tized  by  a  two-dimensional  quantizer.  This  quantization 
can  be  improved  by  using  prequantization. 

The  normalized  mean  square  quantization  error  for  the 
Hsueh-Sawchuk  hologram  in  Fig.  11  is  6.82  X  10 ~J.  This 
compares  with  a  mean  square  error  of  S.2S  x  10" 2  for  the 
prequantized  Hsueh-Sawchuk  hologram.  Thus  the  quanti¬ 
zation  error  is  improved  23  percent  by  the  addition  of 
prequantization.  The  improved  quantization  error  can  also 
be  seen  by  comparing  Figs.  11  and  12.  The  quantization 
error  can  be  approximated  as  a  white  additive  noise  which 
appears  as  the  high  frequency  background  noise  in  the 
holograms.  We  see  the  prequantized  hologram  in  Fig.  12 
has  less  background  noise  than  the  hologram  in  Fig.  1 1. 
Thus  the  prequantization  has  reduced  the  quantization 
error  without  any  harmful  effects  on  the  holographic  image 
itself. 


>y  ^  ft 

|  *  fir* 

Z  & 

w*  ;  %m 

,J» 

_  -  a  1  . - 


Fi|.  II.  Hsueh-Sawchuk  hologram. 


IV.  Discussion 

We  have  presented  a  new  approach  to  the  design  of 
two-dimensional  quantizers.  The  usefulness  of  the  pre¬ 
quantization  approach  has  been  demonstrated  in  three 
examples.  The  hexagonal  quantizer  design  is  of  particular 
importance.  The  prequantization  design  makes  the  use  of 
the  hexagonal  quantizer  with  its  theoretical  advantages 


m  transactions  on  information  THEORY,  VOL.  IT-28,  NO  2,  MARCH  1982 


239 


Fig.  12  Hsuch-Sawchuk  hologram  with  prequanh2ation. 


more  practical.  Existing  two-dimensional  quantizers  were 
examined  in  the  two  other  examples.  In  each  case  prequan- 
tization  reduced  the  quantization  error  while  retaining  the 
other  important  system  characteristics. 

At  this  stage  the  work  on  the  prequantization  design 
approach  is  incomplete.  Presently  there  arc  no  guidelines 
as  to  how  or  when  prequantization  can  be  used  to  design 
two-dimensional  quantizers.  However,  the  results  presented 


here  indicate  that  this  approach  may  deserve  some  con¬ 
sideration  whenever  a  two-dimensional  quantizer  is  to  be 
implemented. 


References 

II)  P.  Zadot,  "Development  and  evaluation  of  procedures  for  quantiz¬ 
ing  multivariate  distributions,"  Ph  D  dissertation,  Stanford  Univer¬ 
sity,  CA,  University  Microfilm  no.  64-9855,  1964. 

(2)  Y.  Linde,  A.  Buzo,  and  R.  M.  Gray,  “An  algorithm  for  vector 
quantizer  design,"  IEEE  Truu.  Common  ,  vot.  COM-28,  pp.  84-95. 
Jan  1980 

13]  A.  Gersho,  "Asymptotically  optimal  block  quantization,"  IEEE 
Trans  Inform.  Theory,  vol.  IT-25,  pp.  373-380,  July  1979 

(4j  W.  A.  Peartman  and  R.  M.  Gray,  “Source  coding  of  the  discrete 
Fourier  transform,"  IEEE  Trans.  Inform.  Theory,  vol.  IT-24,  pp 
683-692,  Nov  1978 

(5)  N.  C  Gallagher,  Jr.,  "Quantizing  schemes  for  the  discrete  Fourier 
transform  of  a  random  time-series,”  IEEE  Trans.  Inform  Theory. 
vol.  IT- 24,  pp.  156-163,  Mar.  1978. 

(6)  - ,  “Spectral  phase  coding,”  Troc.  of  John  Hopkins  CISS,  Apr. 

1976. 

|7)  J.  A.  Bucklew  and  N.  C.  Gallagher,  Jr.,  “Quantization  schemes  for 
bivariate  Gaussian  random  variables,”  IEEE  Trans.  Inform.  Theory. 
vol  IT-25,  pp  $37-543.  Sept.  1979 

(8)  J.  Max,  "Quantizing  for  minimum  distortion,"  IRE  Trans.  Inform 
Theory,  vol.  IT-6,  pp.  7-12.  Mar.  1960. 

(9)  K  Rines  and  N.  C.  Gallagher,  Jr ,  “  Reducing  quantization  error  in 
Hsueh-Sawchuk  holograms,”  Applied  Optics,  vol.  20.  pp  2008-2010, 
June  1981. 

(10)  W.  H.  Lee,  “Computer  generated  holograms:  Techniques  and  appli¬ 
cations,"  in  Progress  in  Optics,  vol.  16,  Amsterdam,  The 
Netherlands:  North  Holland,  1977. 


